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SUMMARIES OF PREVIOUS INVESTIGATIONS 


The new-type or objective test is a fruitful subject for research. 
The number of investigations which are appearing in educational 
literature make summaries of such studies highly desirable. It is 
the purpose of this article to summarize the important investigations 
which have appeared since the previous summary was prepared by 
Lee and Symonds.*** Practically all the articles referred to have 
been published in the two year period, October 1931 to October 
1933, with the exception of a few unintentionally omitted in the 
first summary. 

Several summaries dealing directly or indirectly with new-type 
tests have appeared. The February 1933 number of the Review of 
Educational Research was devoted to the topic ‘‘ Educational Tests and 
Their Uses.” Wood, Lindquist and Anderson reviewed basic 
considerations. The studies on the selection of test items, sixty-nine 
references were summarized by Osburn. One hundred eighty-eight 
researches dealing with statistical procedures were treated in a most 
comprehensive manner by Ruch.** Trabue*! reviewed fifty refer- 
ences dealing with testing for guidance, limiting his treatment to the 
vocational phase on the adult level and neglecting researches dealing 















* The number indicates the reference at the end of the article. 
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i | 162 The Journal of Educational Psychology 
He | with educational guidance. Stenquist® covered the recent develop- q 
By ments in the use of tests, listing one hundred fifty-one references. e: 
ie A list of selected references on statistics and the theory on test t 
ime | construction was prepared by Holzinger and Swineford.?® This te 
& list is in the series published by the School Review and Elementary a 
i a | School Journal. Methods for improving the measuring qualities of tc 
4 the essay examination have been summarized by Sims,” Baker’ u: 
‘ee summarized the work which has been done in the field of the con- d 
ae struction and statistical interpretation of psychological tests. A H 
4 comparison of the studies dealing with the various forms of objective €) 
7 questions was made the subject of an excellent summary by Kinney in 
bits | and Eurich.* They summarized studies of objective tests under fi 
Be the headings of validity, reliability, time required for administration, 
1 relative difficulty, effect on student attitude, pedagogic value, and st 
i | needed investigations. They included a bibliography of thirty- e) 
te : three studies, arranged chronologically, including references from pe 
| May 1921 to April 1931. tk 
oe Lee and Symonds** summarized the investigations appearing ti 
fr h between the publication of The Objective or New-type Examination in 
i 2 by Ruch in 1929 and October1931. Seventy-three references were dis- fa 
x cussed under the following topics: Teaching values of new-type tests, ti 
ad comparative validities, comparative reliabilities, scoring methods, 
4 special problems peculiar to the true-false test, students’ attitudes us 
| towards testing, new types of tests, miscellaneous problems, and iz 
ip ) important discussional references. The organization of the present ra 
ef | article follows very closely the previous one. in 
4 TEACHING VALUES OF NEW-TYPE TESTS - 
Giving Tests.—It is possible that the teaching values of tests may 
be increased by the careful use of test results. The number of recent 
studies of this topic is rather meager. This lack of research is rather 
disappointing for there is much that could profitably be done. The ati 
value of various types of questions when used as pre-examinations was co! 
studied by Jersild,* He found that the use of the true-false test had of 
little or no learning value when used as a pre-test. The group which 
ey took the multiple-choice and essay tests as pre-tests did much better Ci 
Pd than the control group. Hertzberg, Heilman, and Leuenberger™ les 
4 studied the value of objective tests as study guides in educational 


at 
ee, psychology. They gave three tests to their control and experimental 
: groups. For the first two tests the experimental group used objective col 
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questions as study aids, but for the final test no aids were used. The 
experimental group were superior to the control group on the first 
two tests, but there was no difference between the groups on the final 
test. They concluded that objective tests as study guides “Do not 
aid the students in achievement that requires delayed recall.’”’ White 
told an experimental group that a final examination would be given 
using the same true-false questions which were used in short tests 
during the term, a control group were told that there would be no final. 
He found what would seem to be an obvious conclusion, that the 
experimental group which had been toldt hat there would be an exam- 
ination and which were given the exact questions did better on the 
final than the group which did not expect a final test. 

A series of weekly examinations were given by Kulp*! to graduate 
students. At mid-semester, the upper half on the mid-semester 
examination were excused from the later tests. He found that the 
poor students did much better on the final examination, in relation to 
the better students, than they had done on the mid-semester examina- 
tion. Kitch® gave a series of practice tests to an experimental group 
in biology. The experimental group showed critical ratios in their 
favor ranging from 1.34 to 4.13 for tests over four chapters. Prac- 
tically all pupils favored the use of practice tests. 

Knowledge of Results—Incentives can more profitably be made 
use of in testing programs than is done at present. Brown*® summar- 
ized previous studies of incentives and in an excellent study shows 
rather definitely that knowledge of test results is an incentive which 
increases the achievement of pupils. The conclusion which is drawn 
from the study for school room practice “No tests should be given 
unless the children know the results of previous test.” 

Thorndike*® in a recent study of incentives shows that: 


A satisfying after effect strengthens somewhat the connection to which it is 
attached, even though it is irrelevant to the purpose in the interest of which the 
connection was made and highly incongruous with the cravings and expectations 
of the person at the time. 


Concluding a summary of research studies dealing with stimulating 
learning activity, Monroe and Engelhard* state: 


The findings of the studies referred to are almost unanimously in favor of the 
contention that knowledge of progress of learning is an effective stimulus. 
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> : Scoring Methods.—Two studies previously reported“ indicated that al 
| having pupils correct their own papers increased the instructional le 
values of tests. One of the authors of this summary (Lee) suggests tk 


that it is possible to have pupils correct their papers and at the same 
time prevent cheating. This can be done where objective tests are 
used by having an answer sheet to accompany the test. The pupils 
answer the test as usual on the mimeographed test papers but five 
minutes before the time is up they are told by the teacher to copy 
their answers on the answer sheet. When all have copied the answers, 








the answer sheets are turned in to the teacher and the test papers are th 

kept by the pupils. The teacher can then read the answers and the nl 

Bi pupils correct the papers. Cheating is prevented for the teacher has sc 
ize | the answer sheets which were turned in before the correcting began. 
; This device enables pupils to correct their papers and know their 
results. Since both of these factors have been shown to improve 

learning the device should increase the instructional value of tests. of 

Curtis and Darling'* reported a second study on scoring papers. th 

They studied four methods. Having the pupils correct the incorrect th 

items on their own papers gave the best results for both immediate or 

and delayed recall. Two suggestions have appeared for eliminating fa 

teacher scoring of papers and each will probably increase the teaching ng 
value of tests. Smeltzer’”? suggested the use of numbered answer 

sheets to correspond with the test papers. After the answer sheets tic 

have been filled out they are collected, rows interchanged, corrected, H 

and returned. Before the papers are returned an item by item diag- of 

nosis can be made should the teacher desire one. Jeep** suggests an ta 


interesting method similar to the one suggested by Lee. His plan 
involves the following six steps: 
1. After taking test pupil copies answer on an answer sheet. 





2. Answer sheet is turned in and test paper kept by student. As 
3. Student reviews answers making what changes are desired on 
test. 
4. At later meeting of class, pupils copy revised answers on a bs 
second sheet which they hand in. 
5. The class discusses the answers and scores their test papers, ch 
handing them in. 
a 6. The first answer sheet is distributed and scored. ca. 
NA Summary of Teaching Values—These findings should prove ite 
if suggestive for research on the teaching values of tests. There also In 
a seems to be the possibility that research may show that tests which fo: 
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are so easy that all pupils do very well on them may be very excel- 
lent instructional devices. The following techniques seem to increase 
the instructional values of tests: 

1. The giving of multiple-choice and essay questions as pre-tests. 

2. Using objective tests as assignments. 

3. Informing pupils of their results on tests. 

4. The use of satisfying after effects. 

5. Having the pupils correct errors on their own papers. 

In addition to these findings the previous summary“ indicated 
that informing pupils that a final examination is to be given, giving a 
number of short tests throughout the course, and having the pupils 
score their own papers add to the instructional value of tests. 


COMPARATIVE VALIDITIES 


Various Types.—The measurement of the comparative validity 
of each type of objective question or each type of arrangement of 
these questions into tests is one of the most important problems in 
the field of objective testing. While one of the most important it is 
one of the most poorly done. The difficulty of determining satis- 
factory criteria by which to measure validity and the number of rather 
narrow, unrelated studies are evidence for the previous statement. 

Most of the recent studies of validity deal with new types of ques- 
tions or suggestions which the various authors offer. Holmes and 
Heidbreder™ suggest a type of true-false test where the wrong word 
of the false statement is written by the pupil. Their test would con- 
tain such questions as the following: 


(native) 1. Habits are probably the most important native factors of advantage 
in attention in adults. 





As measures of validity of the test they showed that there was a 
consistent decrease in the mean scores from the ‘‘A’”’ pupils to the 
“F” pupils, also high correlations with the other tests of the course 
were obtained. 

Haven and Copeland** in studying variations of the multiple- 
choice test offer evidence which shows: 


The indication of more than one choice on a multiple response test of a few 
carefully selected items is of more value for prognostication than a test of many 
items in which only the first choice is recorded. 


In their study the recording of a best answer and a second best answer 
for each multiple-choice item seemed most effective. 
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The oral true-false test has been studied, but Sims and Knox” 
are among the first to study the effect of presenting the multiple-choice 
test orally. In a careful study they gave four forms of the Thorndike 
Test of Word Knowledge in four ways. They gave one form in the 
regular manner which shall be referred to as the visual presentation. 
They then presented orally three other forms of the test, using in one 
case only three alternates, in another four alternates and in the third 
the regular five alternates which are given in the test. They found 
that the multiple-choice test when presented orally is but slightly 
more difficult than when presented visually. There was no great 
loss of reliability or validity. The use of five alternates seems to be 
somewhat superior to the three or four choices. 

Weaver and Traxler® in a rather limited experiment found that 
objective tests (twenty minutes of working time) and essay tests 
(forty-five minutes of working time) correlated equally well with the 
two criteria of the five objective tests and five essay tests. Their 
study was criticized by Saucier®’ on the basis that their essay questions 
required such definite answers that essentially the essay questions 
were objective type questions. It is interesting to note in this con- 
nection that Sims’* analyzed four hundred ‘fifty-eight typical essay 
questions and found that only thirty per cent were discussional 
questions, while thirty-five per cent were simple recall questions, and 
thirty-five per cent were short answer questions. 

Stalnaker®? described the difficulties in constructing a valid test 
of acceptable and reliable habits of writing. The usual type of 
composition was found to be most unsatisfactory. Leighton, 
with very limited data, attempted to show that essay tests are neither 
subjective nor are they a poor sampling device. His statistical 
evidence was not especially convincing. 

Hurd*? compared short answer and multiple choice tests covering 
identical subject content. He found a correlation of .78 between the 
two types. Had he corrected his results for attenuation he would 
have found that the correlation was .90, indicating that the two tests 
were measuring practically the same thing. Price®* showed that 
multiple-choice items of four responses correlated slightly higher with 
semester history grades than did a Right-Wrong test or a True-False 
test. The testing times were practically identical. Gilliland and 
Misbach?! found that ‘“‘grades based on objective tests in general 
psychology are found to correlate consistently more highly with 
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intelligence test scores and with semester grade averages than grades 
based on equally long essay tests.” 

Length.—Turney®™ showed that short tests having only a limited 
number of items have very poor validity but that cumulating the 
scores on a number of such tests give a valid measure. He also shows 
that the completion and one-word answer tests are more valid than 
the true false tests. He used tests which included twenty to forty 
items. One of the writers (Lee) in a recent survey of the testing prac- 
tices of some sixteen hundred secondary teachers found that the 
median number of objective questions which they include in their 
tests is thirty-one. According to these two findings it would appear 
that the objective tests which teachers give are too short to be valid but 
that they can be made sufficiently valid if scores from a number of 
such tests are combined. Lindquist and Cook™ outline a method for 
determining the ‘‘optimum administrative time’’ for any one test. 

Home-made vs. Standardized Tests ——Previous studies have indi- 
cated that teacher-made objective tests can be as valid as standardized 
tests and jin some cases even more so. Perry and Broom® concur with 
these findings and show that “‘Carefully constructed informal objective 
tests prepared by classroom teachers may be as valid and as reliable 
as the standardized tests in the field of foods.” 

Difficulty of Items.—T. G. Thurstone® has published her most 
excellent study of the validity of items of varying degrees of difficulty 
in a more accessible source. Her study was reviewed in the previous 
summary. Briefly the most valid questions were those of approxi- 
mately fifty per cent difficulty. Horst® showed that fewer alternatives 
on a multiple-choice test which are equal in difficulty are more valid 
than a larger number of choices presenting a wide range of difficulty. 

In a most interesting study Smith® had experienced and inexperi- 
enced teachers and test construction experts rate the difficulty of the 
questions on the various parts of the Stanford Achievement Test. 
He found that experienced teachers could estimate difficulty most 
accurately, then the test expert and finally the least accurate were the 
inexperienced teachers. Arranged in order it was found that the 
difficulty of Arithmetic Problems was easiest to rate followed by Word 
Meaning, Physiology, History, Geography, Literature, and finally 
Language Usuage which was the most difficult. 

Evaluating Items.—Closely related to difficulty is the extent to 
which the item descriminates between pupils of various levels of ability. 
There have been numerous suggestions made as to ways of validating 
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items. Some of the suggested schemes are quite simple, others very 
complex. Lentz, Hirshstein, and Finch*’ studied a number of these 
methods. They conclude that a comparison of the responses of the 


upper and lower third of the class or some such fraction is not only the - 


easiest method of evaluating test items but the best. The best 
method of studying items would be to keep all test papers until the 
end of the semester. Then pick out the papers of the top third or 
fourth of the pupils in total achievement and the papers of an equal 
number of the poorest. The number of pupils getting each item correct 
in each group could then be compared. The study by Lindquist and 
Cook® is most suggestive of the careful approach which needs to be 
made to this problem. They tried out five types of discrimination 
and include recommendations as to the value of each type. Index B 
seemed to be the best for the Jeast effort where U is the per cent of 


U-L 
Index B = a5 

correct spelling in the upper }4 of the group, L is the per cent of correct 
spelling in the lower 14 of the group. Lindquist and Anderson” 
conclude ‘“‘that the ideal test would consist of items of higher dis- 
criminating power distributed evenly over the difficulty scale.” 

Effect of Intelligence—Edmiston™ obtained lower correlations 
between various types of questions (modified, essay, true-false, com- 
pletion, and multiple-choice tests) in the case of retarded pupils than 
for normal pupils. 

Summary.—These studies on validity seem to indicate that: 

1. The wrong word adaptation of the true-false test is valid. 

2. The indication of a best and a second best answer improves 
the validity of the multiple-choice test. 

3. The presentation of the multiple-choice test in oral form is as 
valid as in written form. 

4. Essay questions are apt to include a relatively small percentage 
of discussional questions. 

5. Objective type questions seem to be more valid than essay 
questions judged on the basis of semester marks. 

6. Short tests have poor validity but can be sufficiently improved 
if the scores on a number of such tests are cumulated. 

7. The number of items which teachers include in their self-made 
tests are too few for the tests to have a desirable degree of validity. 
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8. Teacher-made objective tests can be as valid as standardized 
tests. 
9. The individual items in a test should be of such difficulty as to 
be passed by about fifty’ per cent of the group, in order for the test 
to have its greatest validity. 

10. The difficulty of test items cannot be judged by either teachers 
or test experts. 

In addition to these findings the previous summary“ indicated 
that objective tests with the exception of the true-false are slightly 
more valid and correlate higher with intelligence tests than essay, 
the completion seems to be the most valid and the true-false the 
least valid of the objective questions, the objective test is four to five 
times as good sampling device as is the essay test. 


COMPARITIVE RELIABILITIES 


Various Types.—Most of the findings dealing with reliability are 
in connection with the various types of tests for which the validity was 
reported in the previous section. Holmes and Heidbreder*® report 
as high reliability coefficients for the wrong-word true-false test as 
for analogies, matching, and single answer tests. Sims and Knox™ 
show that there is little difference between the oral and written form 
of the multiple-choice test in reliability. They do state that the 
oral form takes about one-sixth longer to give. Turney** shows that 
tests containing few items had reliabilities of .30 to .69 but when the 
scores were cumulated the reliability rose to over .80. 

An excellent theoretical discussion of test reliability by Dunlap” 
illustrated the value of the tetrad technique in determining whether 
or not four forms of a test measures the same thing. Orleans and 
Symonds® showed that teacher-made tests and standardized tests 
given during the middle of the term have the same reliability. The 
study also tended to prove that when the pupils know only part of the 
items the reliability of the test was decreased. 

Difficulty of Items.—Holzinger” furnishes a formula whereby the 
reliability of a single item can be determined. 

Summary.—The studies of reliability seem to indicate that: 

1. The wrong-word adaptation of the true-false test is sufficiently 
reliable for use. | 

2. The oral multiple-choice test is as reliable as the written. 

3. Tests containing only a few items have low reliability but the 
reliability is increased if the scores are cumulated. 
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4. It is possible to determine the reliability of a single test item. 

5. Methods have been suggested for the use of the tetrad technique 
in determining reliability. 

In addition to these findings the previous summary“ indicated 
that objective tests have higher reliability than essay, there is little 
difference in the reliability of objective questions when considered on 
the basis of equal working times, modified true-false tests are more 
reliable than the usual types, repeated gradings of essay papers by 
the same persons are unreliable, tests having items of practically the 
same difficulty are most reliable. 


SCORING METHODS 


Effect of Weighting.—Sufficient studies have been made on the 
effect of weighting items on objective tests, either according to dif- 
ficulty or according to instructors’ judgments. It would seem that 
time could be more profitably spent on other researches. Shouse”! 
discusses various methods of scoring and offers some evidence to show 
that counting the correct items is probably as satisfactory as any. 
Potthoff and Barnett*®? scored in eleven different ways a hundred item 
test given to four hundred pupils. They had ten different judges 
assign weights to each item according to importance and used these 
ten different scoring schemes and also the number of items correct, 
making eleven. They found correlations ranging from .96 to .98 
between the various judges’ weightings and the raw score method. 
When the pupils were assigned marks according to the various scoring 
systems eighty-eight per cent of the pupils received the same mark on 
alleleven methods. With the teacher whose ratings were most extreme 
only two per cent of her pupils were displaced more than .6 sigma from 
the unweighted scoring scheme. Scates and Noffsinger® offer evidence 
of high correlation between four methods of weighting the Indiana 
Edison Contest and in addition they present a theoretical discussion 
of weighting. Odell® in a rather careful experiment using various 
methods of weighting items in new-type tests concludes: 


There is so little to be gained by unequally weighting the elements that it is 
not worth the labor. 


Scoring Weighted Item Tests—There are a number of tests which 
are scored a number of times and each time the items are given a 
different weight. Some of these tests are the Strong Vocational 
Interest Blank, Garretson-Symonds Interest Questionnaire, and the 
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Bernreuter Personality Test. Reference to scoring methods of these 
tests may be out of place in this article but since some of them may be 
helpful with other objective tests they are included. Rulon and 
Arden® suggest a method of scoring such tests using Hollerith machin- 
ery. Stagner*! makes a clever scoring suggestion which can be used 
in any school system where mimeograph and adding machines are 
available. His procedure requires, first, that all weights be changed 
to positive numbers by adding the necessary constants to all weights. 
Second, a mimeograph sheet is made out which contains the scale 
values to the responses for all scorings. The headings and the top 
line of such a sheet for the Bernreuter test would appear as follows: 


Yes No ? 
se. 2 —— Ja ie | ae ay Soe 
Gis wa es = wee . 2° 3 |S 





After the pupil filled out the test the score sheet is made out for each 
pupil. If the pupil answered yes to item 1 a pencil line would be 
drawn under the numbers in the Yes columns, and so on for all the 
items. ‘The numbers having lines under them could then be added on 
a machine. If a wide carriage adding or calculating machine is used 
he N, S, J, and D columns could all be added at the same time. 

Miscellaneous Scoring Problems.—Cuff'? furnishes data to indicate 
that the use of an answer sheet for true-false or multiple-choice tests 
where the pupil crosses out the correct number is a time saver. He 
suggests that after the pupils have indicated their responses, the 
sheets can be run through the mimeograph machine which has a 
stencil cut so it will encircle the correct answers. Tests can thus be 
scored “ . . . from two to six times more rapidly than when answers 
are indicated opposite the test items.”’ 

Newcomb and Goodwin Watson® experimented with having 
graduate students score their own papers. They gave a test near the 
end of the period and at the end of the period collected the papers. 
The papers were then scored without placing a mark on them. The 
following day they were returned to the students and the student 
scored them. Their findings are as shown at the top of the next page. 
It would appear that if graduate students do not cheat they tend to 
make mistakes in one direction. Goodrich?? has shown that pupils 
can be taught to correct spelling papers satisfactorily. 

Sims’* shows that when essay questions are scored by grouping 
them into piles and using scoring rules the results of seven gradings by 
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NuMBER oF Papers 

Returned with the same score..................0ceeceeeeeee 165 
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as many readers tend to agree. Since his findings show much more 
agreement than is usually the case they are given in detail. He found: 

The average correlation between a reader and the average of other readers 
was .97. The lowest correlation between any reader and the average of the others 
was .96. 

The average correlation between a reader and each of the other readers was .94. 
The lowest correlation between any two readers was .91. 

For the discussion questions only, the average correlation between a reader 
and the average of the other readers was .92. The lowest correlation between 
any reader and the average of the other readers was .83. 

For the objective questions only, the average correlation between a reader 
and the average of the other readers was .97. The lowest correlation between any 
reader and the average of the others was .95. 


Other excellent studies on the same problem have been reported by 
Sims. 72:76.77 , 

Summary.—Studies on scoring methods seem to lead to the fol- 
lowing conclusions. 

1. Weighting of items either by difficulty or by judgment is not 
worth the trouble. It seems needless to make further studies of this 
problem. 

2. The scoring of personality and interest tests can be simplified 
through the use of a mimeograph sheet with the score values on it. 

3. An answer sheet which can be run through the mimeograph 
having the correct answers indicated on it appears to be a time saver. 

4. Some graduate pupils do not score their test papers accurately 
when they feel that there is no check possible. 

5. Essay questions can be scored by grouping the questions into 
piles and using scoring rules so that there is agreement between 
judges. 


SPECIAL PROBLEMS PECULIAR TO THE TRUE-FALSE TEST 


Indeterminate Items.—Indeterminate items have been discussed by 
Weidemann alone” and with Newens.”'° An indeterminate 
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question is an item of information which approximately all persons 
would consider as a controversial issue. Weidemann states that: 


The indeterminate statement may have a wide and extensive use in the class- 
room from the viewpoint of the student as a basis from which to develop directed 
classroom discussion.*” 


He suggests a method of responding to such items by marking 1 
if the item is considered true, 2 if more likely to be true than doubtful, 
3 if doubttul, 4 more likely false than doubtful, and 5 false.* The 
other references describe how a key was made out for the test® and 
how directions were studied.’ 

Oral vs. Printed.—Do students prefer true-false tests which are 
written or which are given orally? There are two recent studies on 
this problem and their findings show no agreement. A study of five 
college classes by Stumpf* showed that eighty-four per cent of the 
pupils preferred the written form of the test and sixteen per cent the oral. 
Crawford” found that after students had considerable experience with 
both types forty-two out of one hundred twenty preferred the mimeo- 
graphed form, forty-three preferred the oral and thirty-five showed 
no preference. Due to the fact that the students had had experience 
with both types for some time gives one confidence that Crawford’s 
results are probably closer to the truth. The principal reasons which 
pupils gave for preferring the written form in Stumpf’s study were 
that they could control the time and they avoided misunderstandings. 
The reasons given for prefering the oral were that immediate reactions 
were more accurate and that decisions once made were held to. 
Stumpf correlated the oral and written forms with each other and 
with intelligence but had so few items in his tests that the findings 
are probably insignificant. Briggs and Armacost‘* found that the 
reliability of the oral form compared favorably with printed tests. 

Guessing.—Krueger® in using experimental material with which 
the students were unfamiliar found that they gave fifty-one responses 
of true to forty-nine responses of false. Apparently they do not 
favor the true response more often than the false, especially where no 
element of suggestiveness is present. In further work he found that 
poor students seemed to be more certain of their judgments than did 
the better students and the portion of error was markedly larger 
among the poor students. Another careful experimental study by 
Krueger* showed that “Chance guessing may frequently yield very 
high and very low scores in short tests (up to sixty items).” 
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In the longer tests this probability is practically eliminated. When 
the right minus wrong formula is applied to the longer lists 
“Practically every score yields a final mark of approximately zero.” 
In contrast to this study is an inadequately reported study by Melbo.** 
He evidently had students indicate for each item on a list whether 
the answer was based on exact knowledge, part knowledge, or guess- 
ing. He disregarded the fact that when they said they absolutely 
knew the answer, they were right only eighty-seven per cent of the 
time; concluding that because they were right 58.5 per cent of the 
time on the guessed items that the R-W method of scoring is wrong. 

Summary.—lInvestigations dealing with true-false tests seem to 
indicate: 

1. The use of indeterminate statements may have instructional 
value. 

2. Students probably are evenly divided as far as preference for 
oral or written true-false tests is concerned when they have had 
considerable experience with both types. 

3. Where there is no element of suggestiveness, students give 
practically the same number of true as false responses to items. 

4. Poor students seem to be more certain they are right and more 
often wrong than the better students. 

5. On long tests the R-W formula eliminates the chance of high 
scores through guessing. 

The previous summary“ showed that specific determiners existed 
in large numbers and should be used equally in true and false state- 
ments, and that second impressions tend to be more often right than 
wrong. 


STUDENTS’ ATTITUDES TOWARD TESTING 


Students in science prefer objective tests to the essay type accord- 
ing to the data presented by Hurd** and Diamond.” Hurd found 
that physics students on the college level preferred objective tests, 


largely because such tests covered more ground. Diamond studied 


the preferences of high school students finding that they also preferred 
objective tests. He also found that pupils preferred tests made out 
by other pupils and tests where graphic records of results were kept. 


NEW TYPES OF TESTS 


Most of the suggestions for new forms are modifications of the 
present type in some particular. Holmes and Heidbreder™ suggested 
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the use of wrong answer type of true-false question, which was illus- 
trated in the section on validity. This is merely a slight adaptation 
of previous proposals. Weidemann*-'™ pointed out the possibilities 
of using the indeterminate item as a five response item. Haven and 
Copeland** and Sheidemann® suggested marking more than one 
response in the multiple-choice items. Hevner™ suggested a test 
for use in measurement and statistics courses which requires the 
matching of diagrams of frequency distributions with problems. 
Gerberich” describes a test requiring students in measurement courses 
to evaluate the value of test items. Brooks® suggests the use of one 
hundred terms in which the pupil will underline those which he can 
define and then define every fifth item underlined. The difficulty 
is that the pupils can ‘‘ beat” the scoring scheme which he suggests. 
A slightly different essay test has been suggested by Clark. About 
eight statements are presented. The pupil indicates whether he 
thinks the statement is true or false, then writes several paragraphs 
defending his position. 

A published set of objective questions* for use in high school 
physics classes has been presented by Kirkpatrick.** The teacher 
can select questions which are suitable for the class. 

Four excellent suggestions for new test forms in English have 
appeared. Stalnaker** offers suggestions for the development of new 
types of objective tests in English Composition. He* also presents 
several most excellent objective methods of measuring various phases 
of outlining and organization. Carroll’ describes the standardization 
of his test to measure prose appreciation.t Trager®* gave a refreshing 
account of her efforts to construct a literary appreciation test. Selec- 
tions of varying degrees of ‘‘goodness” were given and the pupil was 
to rank them. The samples are included in the article. 


POPULARITY OF OBJECTIVE TESTS 


A study by Lee*? surveyed the testing practices of nearly five 
hundred secondary schools and of sixteen hundred teachers in seventy 
of these schools. Teacher-made tests were given eight times as 
frequently as were standardized tests. The teachers were asked to 
indicate the type of test which they principally used. Sixteen per 
cent of the teachers indicated that they made principal use of the 





*“A Pupil-Teacher Handbook of Objective Test Exercises in High School 
Physics” published by Public School Publishing Company. 
t Published by the Educational Test Bureau. 
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essay test, seventy-four per cent of the objective type, and ten per cent 
of both types in combination. The most interesting finding was 
that only two per cent of the teachers stated that they made exclusive 
use of essay type questions. It would appear that in a relatively 
short span of twelve years most secondary teachers have been con- 
verted to the use of objective tests. 


TRENDS IN TESTING 


There are several trends which should be mentioned as appearing 
significant. The teachers in many schools are building coopera- 
tively objective tests which measure the course which they are teach- 
ing. An example of such work appears in the article by Willman” 
which describes an English test constructed by the teachers of the 
J. Sterling Morton High School. Objective tests are being included 
in courses of study. In some cases they are included as examples for 
the teachers, in others, for actual use. The Course of Study of Los 
Angeles on ‘‘The Story of Civilization’’™ includes self-tests for the 
pupils on the important points of the unit. 

An article by Segei’® provides an excellent discussion of the trends 
in testing, including to a limited extent trends in objective testing. 
Lee** in summarizing opinions of administrators and research directors 
as to needed developments in secondary school measurement quoted 
various comments, some of which follow: 

The tests are planned here in the departments and changes are made each year 
to meet the needs of the curriculum. I find that an imposed testing program does 
not arouse the same support that is given one worked out by teachers. 


We have had teachers from each school in the city plan tests for the entire 
city. In this case there was stimulated considerable interest. 


He found that according to the opinions of research directors the 
“training teachers in making and using tests” and ‘‘better tests” 
tied for first place in their list of suggested needed developments. 
‘‘Constructing new-type tests based on the course of study” ranked 
next. 

These trends all emphasize the need for continuous development of 
objective tests which will measure the objectives of the local course of 
study. | 


MISCELLANEOUS PROBLEMS 


Spelling —Foran™ presented a summary of studies dealing with 
the form of presenting spelling words. After making a comprehensive 
study of the various types he concludes: 


Err 


On 


ani 


the 
in | 
poc 
the 
obj 


me 
for 


den 


and 


obj 








Co Do FF = we CUNY 


na 


Ro m *” 


re 


ne 


” 


S. 


ad 


ith 
ive 





New Type or Objective Tests 177 


‘Recognition tests of spelling are generally unsatisfactory.” 

He found that there was little difference between the various recall 
types of testing. Lindquist and Cook” showed that spelling tests 
involving recognition plus recall yield higher validities than the 
recognition. Corrections for guessing increased validity of two 
response types of spelling test. 

Faults in Tests —Diamond" in an excellent study presents the 
faults of the standardized tests in general science and biology. Though 
this study was based on standardized tests the same faults might be 
even more likely to occur in teacher-made tests. He found that the 
principal faults or errors in the tests were: 

1. False generalization. 

2. Failure to keep up wth scientific progress. 

3. Mistaken theory for proven fact. 

4. Lack of scientific classification. 

5. Lack of scientific definition. 

6. Errors of tradition. 

Errors not based on subject-matter: 

1. Ambiguity. 

2. Spelling and typographical errors. 

3. Lack of difficulty in test items. 


One wonders what errors would appear if teacher-made tests were 
analyzed when such errors appear in published tests. 

Anderson and Lindquist! illustrated difficulties in making out 
matching items in history tests. Elsewhere they*® have discussed 
the multiple-choice tests in history. Barr* has shown the difference 
in the types of questions asked by good teachers and those asked by 
poor teachers. 

Methods of Studying for Tests.—Terry® showed that students on 
the college level prepared differently for essay than they did for 
objective tests. The emphasis on detail was outstanding in the 
methods of preparation for the objective tests while the preparation 
for the essay test emphasized the study of larger units of subject- 
matter. Crawford'!! presented methods suggested by graduate stu- 
dents for studying for the various types of tests. 


IMPORTANT DISCUSSIONAL REFERENCES 


There are several recent articles and books which are significant 
and which should be read by those interested in objective tests. 
Cason® and Sims’ discuss the comparative merits of the essay and 
objective tests. Sims article, which is the better of the two concludes: 
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The most satisfactory testing procedure for most school subjects would involve 
the supplementing of objective tests with essay questions definitely designed to 
measure these higher order habits of relationship and organization. 


Lefever* presents a timely article pointing out pitfalls to be avoided 
in constructing teacher-made tests. McClaskey®? urges the use of the 
same tests from semester to semester with the pupil recording the 
answers on an answer sheet. Where each page covers one unit or 
phase of a unit and can be changed easily this suggestion appears 
highly desirable. Hurd*! discusses the use of a preliminary test and 
its value in directing instruction. 

An excellent discussion of testing on the college level is presented 
by Holzinger.* Park® has shown the importance of the problem of 
constructing objective tests in measurement courses. A good treat- 
ment of objective test questions for mathematics has been given by 
Durell and Durell.” Principals’ opinions towards testing has been 
summarized by Trillingham.** In an article directed to teachers of 
biology, Tyler® presented his point of view on test construction: 


If we are to improve the tests in biology it must be done by formulating all of 
the important objectives which we are trying to reach in our biology teaching and 
then to develop tests or examinations which will give us evidence of the degree to 
which pupils are attaining these important objectives. 


The best source of sample tests has been presented by Ruch and Rice.® 
They have included complete copies of thirty-six prize winning teacher- 
made tests in the various fields. Symonds* and Odell*’ discussed 
the testing program on the high school level. Trends in testing has 
been excellently discussed in the previously mentioned article by 
Segel.” Needed developments in secondary school measurement 
have been pointed out by Lee.** Present tendencies in measurement 
has been treated by Woody and Sangrin. 4 

The problem of building tests in history has been presented in a 
most valuable manner by Lindquist.“ He not only discusses the 
problems involved but gives examples of well constructed questions 
which would be most suggestive to any teacher of history. 
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NOTES ON VALIDATION OF TEST ITEMS BY 
COMPARISON OF WIDELY SPACED GROUPS 


DAVID F. VOTAW 
Southwest Texas Teachers College 


If a test contains too large a proportion of items highly selective 
of superior students the distribution of scores made on it by a normal 
class will be positively skewed. Instead of a gracefully tapering 
curve at both ends of the distribution the lower end of the scale will 
be heavily loaded with inferior scores. It is the belief of the writer 
that one reason for this all too prevalent condition (particularly with 
true-false and multiple-choice tests) is faulty technique used in vali- 
dating the items of the test. The technique in common use for dealing 
with widely spaced groups is too severe in its demand for selectivity 
of an item. Thus many items are rejected which in fact are entirely 
satisfactory in qualities of selectivity. These items, if retained, will 
eliminate much of the skewness found in distributions of scores made 
on the completed test. 

The technique referred to is as follows: 


1. Administer preliminary items to sample of population for which standard- 
ization is desired. 

2. Determine for each item the mean proportion of the lower twenty-seven 
per cent who answer correctly. 

3. Determine for each item the mean proportion of the upper twenty-seven 
per cent who answer correctly. 

4. Determine for each item the difference between the means of these pro- 
portions. 

5. Compute for each item the probable error of the difference between the two 
means. 
6. Compute for each item the ratio of the difference to its probable error. (If 
the difference is in favor of the upper group by as much as three times its probable 
error—about a twenty to one probability that it was not due to chance—the 
difference is generally regarded as significant and the item is retained for the 
final test.) 


A PROPOSED NEW PLAN 
PRINCIPAL @ 


The above technique appears to be sound with the exception of 
processes (2) and (3) in the event true-false or multiple choice items are 


being validated. In those instances comparisons should be made on 
185 
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the basis of the proportion who know the answer to an item instead of 
on the basis of the proportion who answer it correctly. 

For example, consider a true-false item which had been adminis- | 
tered to a preliminary group of two hundred thirteen students. The 
upper twenty-seven per cent and lower twenty-seven per cent will 
consist of fifty-eight students each. 


N = 58, p = 4, q = , and the mean = Np or 29. 


Therefore twenty-nine correct responses may be expected in the event 

that all of the fifty-eight students made random guesses at the answer. 

In other words twenty-nine correct responses to the item indicate that 

none of the students knows the answer. For converting the number 

of. correct responses into the number who know the answer the general 

equation z = 2R — N may be used in which z is the number knowing 

the answer, R is the number of correct responses, and N is the total 
| number of responses. (It is assumed throughout, of course, that 
i students were instructed to respond to all items.) / 
0 In the problem under consideration fifty-eight should be substi- 
. | tuted for N and successive numbers from zero to fifty-eight should be 
substituted for R. A graph of the equation provides a means of . 
aa making rapid conversions. After making the conversions for both 
Hi: the upper and lower groups computations of the proportions knowing { 
Pig answers may be made for each item and the criterion of a difference I 
Bike: for each item favorable to the upper group and equal to at least three 
: 





probable errors of the difference may be applied. For this purpose a 
graphical method proposed by the writer! will be found convenient. 
The equation for the graph is 


fi. pi = 2p2 + h? + v/ (8h? + 4h*)(p2 — pz) + A! 

ie 2(1 + h’) 

' in which h = .6745k/+/N, k =3 (If it is desired to maintain a 
difference of three PE), N = number of students in either group, 
~1 = proportion of upper group who know answer, and p2 = propor- 
tion of lower group who know answer. The equation maintains a 
relationship between p; and pz such that p; always exceeds pz by 
exactly three probable errors of their difference. The graph may be 
projected by an obvious reversal process into the negative field for use 
of the new plan. 


1 Graphical Determination of Probable Error in Validation of Test Items. b 
Journal of Educational Psychology, Vol. XXIV, 1933, pp. 682-686. it 
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Clearly many items which would have been rejected under the 
old plan will be saved by use of the new plan. For example: 


TaBLE I.—ILLUSTRATION OF PossIBLE SAVING OF TRUE-FALSE ITEMS BY USE OF 











New Puan 
(Fifty-eight Responses in Each of the Upper and Lower Groups) 
Old plan New plan 
To be re- : » 
If in lower tained it In lower Which wane ~— — 
group item must be _ | group answer hing rd = eure Differ- 
is answered janswered cor-| was known perth) Fran. me eee ence 
correctly by | rectly in up- by uP . group ——- group 
per group by . 4 
50 56 42 50 54 2 
40 49 22 32 45 4 
30 41 2 8 33 8 
20 30+ —18 -— 9 25 6— 
10 20 —38 —28 15 5 
0 4 —50 —42 8 4 




















In validating a true-false test of three hundred seventy-five items 
the new plan saved for the writer seventy-one items which would have 
been rejected by the old plan. 


PRINCIPLE 0} 


The use of the new plan uncovers a problem that has been ignored 
under the old plan. It is the problem of the use and interpretation 
of negative numbers knowing answers. To state that a negative 
number of students knows the answer to a true-false or multiple-choice 
item is simply another way of saying the item is so stated as to “‘trick”’ 
students into making incorrect responses, the key is wrong, or some 
other serious fault exists to render it invalid. The probability of such 
conditions existing may be determined for any negative result. In 
the case of a true-false item; if N = 58, p = 4, and q = \% the mean 
number = Np or 29. 


PEy, = .6745°*/N pq or .67457/58 - 14 - 4 or 2.57 





More specifically, there is a fifty-fifty chance with all fifty-eight of the 
best students making random guesses at the answer to a true-false 
item that a number within 2.57 range below and above twenty-nine 
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will get it right. If fewer than 26.43 get the item right the probability 
is greater than 4 that the small number of right answers was due to 
other causes than pure chance and the item should be rejected for 
that reason alone. The critical number of correct responses, there- 
fore, for the upper group is 26.43. To be practicable it may be 
stated that no true-false item responded to correctly by fewer than 
twenty-seven students of the upper group should be retained. Appli- 
cation of the conversion formula will show —4 to be the number know- 
ing the answer in that case. The maximum number in the lower group 
who may know the answer to the same item and yet maintain a differ- 
ence insuring selectivity is —11, but since —11 can not occur in fact 
—12 is the highest number in the lower group who may know the 
answer to render it acceptable when the upper group indicates —4 
knowing. 

Table 2 below is an abridgment of a more complete table showing 
differences which must be maintained to insure selectivity for the 
special case of fifty-eight students in each group. 


TasLe II.—Criticat DirrerRENcEs TO INsURE SELECTIVITY 


Is 1n THE LOwER Group THE Ir Must Be Known 1n UPPER 
ANnsweR KNOWN BY - Group By aT Least 
NoumMBER STUDENTS NumBer StTupENTs 


rejected 

rejected 
58 
56 
49 
40 
31 
20 
11 


mannonasSSSSR8E 


—10 —3 
|| —4 


It is easy to see that the application of principle (6) will result in 
some items being lost which would have been retained by the applica- 
tion of the old plan. However, the statistical evidence indicates that 
such items should be discarded. In dealing with three hundred 
seventy-five true-false items the writer found only fifteen the answers 
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to which were known by fewer than —4 students of the upper twenty- 




















4 seven per cent (fifty-eight students). All but six of these would have ant 
a been rejected on the basis of some other criteria, some being actually ‘tm i 
: adversely selective and others failing to maintain satisfactory differ- ef 
4 ences between the proportions of upper and lower groups knowing the | 1 
se answers. Under the old plan, however, nine of them would have ‘oe 
; been retained. af, 
y= Tasie II].—Comparison oF Numbers OF Items AccePpTep BY OLD PLAN AND BY ; i 
ip New Pian |: ae 
‘ (True-false) | | 
om 
ct Old plan....................| Accepted, 189 Rejected, 186 ag 
SS NN, Aq 
» New plan...................| Accepted Rejected Accepted Rejected ie 
‘4 Principle(a)..............| 180 g0 109 i 
EE fo cock ws shee’ 9* 6t ‘i 
1g Total new plan..............| Accepted, 260 Rejected, 115{ rt ' 
ne * Three of these are duplicated in the one hundred nine rejections. % 
t All six of these are duplicated in the one hundred nine rejections. a 
t The sum of nine, six, and one hundred nine less nine duplications. ; | 
Although the discussion so far has been limited to the true-false He 
type of test item for specific illustration, the principles of the new plan Hf i 
apply with equal force to multiple-choice items. Again it is necessary 2 
to make comparisons on the basis of proportions knowing the answer 
to an item instead of proportions responding correctly. The general 
equation for making the conversion is 
nk —N 
r= ———_ 
n—1 
in which z is the number knowing the answer, n is the number of ; ; 
possible responses to each item, R is the number of correct answers to * 
the item, and N is the number of students involved. " 
The same group of two hundred thirteen students were adminis- ¢ 
tered a four-response multiple-choice test of one hundred forty-one 
items. Again fifty-eight students constitute each of the upper and 
lower groups to be compared. The above equation then becomes q 
4R — 58 4 
in ors which may be graphed for convenient conversion of i 
@ number of correct answers into number knowing answer. ; 
a The remainder of the process of principle (a) will be identical with | 


the technique described for true-false items. 
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Principle (b) of the new plan applied to four-response multiple- 
choice items will differ from its application to true-false items only 
in the respect that a different mean number and a different probable 
error of the mean will be obtained. In this case: 


N = 58,p = 4, andq = %4 
Mean number = Np or 14.5 
PEyp = .6745+/Npq or 2.22 


‘In other words if there is one hundred per cent ignorance of the 
correct answer to an item, 14.5 correct answers may be expected by 
chance. The probable error of the mean number indicates that the 
probability is 44 that the number of correct answers will be within the 
range 12.28 and 16.72. Consequently, thirteen correct answers to a 
four-response item by the upper group are the fewest which may be 
accepted safely. Reference to the conversion formula shows that in 
such a case —2 know the answer. Therefore all items yielding less 
than —2 should be rejected for that reason alone. 

The writer lost only eight of the one hundred forty-one items as a 
result of the application of principle (b) five of which would have been 
lost by some other criteria. On the other hand thirty-four items 
which the old plan would have rejected were saved by principle 
(a) of the new plan. 


TaBLe IV.—GaIn IN Pornt Scores Dus To Use or New PLAN 
(From 375 True-false Items) 








Upper twenty- Lower twenty- 
seven per cent seven per cent 
(fifty-eight (fifty-eight 
students) students) 
By old plan (one hundred eighty-nine ac- 
cepted items). 
Be ID GID, 5 oc in's sce veteccvecs 6796 37* 
Na ois ik oS ack nh a ann bik hind bale we 117.17 .64f 
By new plan (two hundred sixty accepted 
items). 
ee 8615 832* 
Ee ES oon hed ake wk acade aan’ 148.53 14.34f 











* Algebraic sum of positive and negative scores. 
t Thirty-two students made negative or zero scores. 
t Only two students made negative or zero scores. 
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The value of retaining these additional items by principles 
which are statistically sound may be illustrated by data from final 
point scores on the true-false test discussed in this paper. Clearly, 
greater differentiation of students near the lower end of the scale is 
accomplished. 

As would be expected the upper group gained more than the 
lower, but the important result is that many scores occupying posi- 
tions under the old plan on or near the zero end of the scale were 
shifted by the new plan to higher intervals. 


SUMMARY 


1. In determining if a true-false or a multiple-choice test item is 
selective of ‘‘good” students the comparison of superior students with 
inferior students should be made on the basis of the proportions of the 
two groups who know the answer to the item and not on the basis of 
the proportions answering the item correctly. 

2. Many items which would be rejected because of doubtful 
selectivity if comparison of upper group with lower group were made 
on the basis of proportions answering correctly will be retained if 
comparison is made on the basis of proportions knowing answer. 

3. If a probability greater than 4 exists that superior students 
were ‘‘tricked” into incorrect responses to an item, the item should 
be rejected. 

4. Use of the proposed new plan reduces the tendency of scores to 
“bunch” at the zero end of the scale. 
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THE RELATIONSHIPS BETWEEN THE THEORIES OF 
GESTALT PSYCHOLOGY AND THOSE OF A 
PROGRESSIVE SCIENCE OF EDUCATION 


JOHN W. CARR, JR. 
Duke University 


During the last quarter of a century progressive education in the 
United States has developed on an empirical and philosophical basis. 
Accepted psychology, proceeding on mechanistic and atomistic 
assumptions, has been considerably out of harmony with the methods 
and materials which have been improvised by artistic teachers and 


/ which found a theoretical justification in the philosophy of John 


Dewey and W. H. Kilpatrick. During this period the generally 
accepted point of view in psychology has been largely inadequate 
because it has emphasized almost exclusively structural, as opposed 
to functional, analysis. Structural analysis is the type most fre- 
quently used in natural sciences. It reduces a whole to its component 
parts. Functional analysis, on the contrary, finds out what happens 
to a whole under conditions which modify or preserve it. A func- 
tional analysis of the learning process recognizes that experience is 
a unit in place and time and concludes that learning, which depends 
upon experience, cannot be completely explained by such atomistic 
concepts as S-R bonds, conditioned reflexes, and associations. The 
school of psychological investigation which has used most widely 
functional analysis is called Gestalt or configurational psychology. 

In 1890 Ehrenfels, a German psychologist, introduced the concept 
of form, or pattern quality. He defined Gestalt-qualitat as a property 
possessed by the whole which is not found in any of the parts making 
up the whole. Basing investigations largely on this concept, and 
transferring several of the theories of modern physiology, biology, 
and physics into the field of psychology, a group of German investi- 
gators—led by Max Wertheimer, Wolfgang Kohler, and Kurt Koffka— 
have discovered abundant evidence to support the theory that the 
whole of psychological processes is not merely the sum of distinct 
parts. Professor R. M. Ogden in his text Psychology and Education? 
showed something of the meaning of these investigations for students 
interested primarily in Educational Psychology. ‘More recently 





1 Wheeler, R. H. and F. T. Perkins: ‘Principles of Mental Development.” 
Thomas Y. Crowell Co., 1932, pp. 239-241. 
2? Ogden, R. M.: ‘‘ Psychology and Education.” Routledge, London, 1926. 
192 
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R. H. Wheeler and F. T. Perkins have published a book! which gives 
a complete statement of the principles of mental development from 
the Gestalt point of view. 

It is interesting to note that even prior to the emergence of an 
educational psychology based on “‘configurationism,’”’ there arose a 
school of educational philosophy which has proposed many of the 
hypotheses for which Gestalt psychology has produced scientific 
evidence. This philosophy, growing largely out of the work of 
John Dewey and W. H. Kilpatrick, has proposed that the whole 


child should be considered as changed by a particular teaching pro- / 


cedure; that total, rather than isolated, learnings are important; that 
parts should be learned in relationship to the whole in which they 
belong. Educational investigations prompted by this school of 
educational philosophy have been severely critical of many of the 
practices of the schools as they are organized at present. The use of 
so-called homogeneous grouping based on partial measurements of the 
child’s achievements has been questioned; the over-emphasis on 
standard tests as a measure of what teachers should be trying to 
accomplish has been criticised; current classroom procedures based on 
assigned learning of meaningless material, on formal, monotonous 
drill, have been questioned. Purpose and need have been assigned a 
central place in the learning process, and curriculum materials have 
been organized around the purposes and needs of children so as to 
produce the outlines of a course of study which disregards the artificial 
barriers between subjects and presents subject-matter in its relation- 
ship to the total experience of the child. 


This progressive movement has been handicapped by an educa- / 


tional psychology which is partial and inadequate. It is true that 
attempts have been made to re-state the S-R bond theory, associa- 
tionism, behaviorism, and hormic psychology in such fashion that they 
might be used to support the practice of large-unit or project teaching; 
such attempted interpretations seem inadequate, however, because 
these schools of psychology have analyzed behavior into minute, 
atomistic, mechanistic units and have broken the child’s personality 
into relatively segregated traits, which are supposed to originate in 
distinct instinctive urges. Even though it might be argued with 
justification that these psychologies are not necessarily atomistic in 
their point of view, in the educational applications of their theories 
a functional analysis of behavior seems to have been neglected. The 


1 Wheeler, R. H. and F. T. Perkins: Op. cit. 
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idea that behavior emerges from a prior-existing, unique whole, the 
personality, seems to have been disregarded in these psychologies, 

It is interesting to explore in more detail this parallel between 
the theories of the Dewey-Kilpatrick philosophy and the hypotheses 
from which Gestalt psychology approaches the problem of behavior. 
A further discussion of this relationship will indicate the possibility 
of building up an adequate psychological support for some of the 
theories and practices of modern education. The philosophy in which 
much progressive teaching practice takes root may be briefly sum- 
marized under four heads: First, education should be a process of 
continuous growth; second, the growth of a child should proceed in 
a unified, integrated fashion, so as to adjust the individual to an 
enriched environment; this adjustment is both a socializing and an 
individualizing process, one that involves mutual readjustments 
between the individual’s desires and social institutions; the modern 
social environment is dynamic, everchanging, so that the future 
cannot be predicted; third, purpose or need is the essential condition 
of learning: Real, as contrasted with pseudo-learning, is creative in its 
nature, involving invention or discovery on the part of the learner; 
hence, reactions are best learned in situations where they are used, or 
under conditions where they are related in the understanding of the 
learner to real situations; fourth, the end toward which education 
should be directed is the development of a socially integrated per- 
sonality. What support do the theories of Gestalt afford for such a 
philosophy of education? 

First, the concept of education as a process of continuous growth 
is supported by the Gestalt hypothesis of the nature of learning. 


It is inevitable, under normal conditions that the child learn something, as 
inevitable as the fact of physical development, for learning is that aspect of growth 
that is observed through the individual’s behavior. (Learning is the realization of a 
growth potential, set up by a physical and social environment, just as a gain in 
height and weight results from a growth potential obtained from oxygen and food.' 


The Gestalt theory suggests how this growth takes place. The 
process is one of expansion, in which wholes become more and more 
differentiated. Investigations of the growth process in animals 
while in the embryonic state and studies of the first movements 
of embryos indicate that in this gradual differentiation the whole 
governs the development of the parts; that the embryo first shows 





1 Wheeler, R. H. and F. T. Perkins: Op. cit., p. 8. 
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general and crude responses, which become more refined as maturation 
takes place. In a similar fashion, the first responses of a human being 
who is confronted with a relatively new situation are often crude and 
general in their nature. As learning takes place a more specific 
type of reaction appears, and a differentiated, structured, pattern of 
behavior evolves. The hypothesis is that this.growth produces a 
configuration, an organization, a whole which is not merely a sum- 
mation of distinct parts. 

Such a theory of the learning process seems to support the prac- 
tice of teaching subjects together, a procedure which is found in 
modern schools. The integration of subject-matter around child 
activity is held to be especially important during the first few years of 
schooling. This educational theory holds that, to the little child, 
experience is general and undifferentiated. As maturation takes 
place, there is a gradual, continuous, reorganization of experience so 
that more specific and differentiated structures develop out of the 
evolving whole. 


The various sorts of learnings in time grow into organized bodies which one 
may legitimately call subjects. The child thus eventually comes into a knowledge 
of subjects of study. The knowledge takes logical organization as the child con- 
tinuously articulates new learnings with previous, similar learnings. As adult life 
approaches, he comes to have adult organizations of subjects, which are usually 
thought of as more or less complete organizations. 


In terms of the Gestalt theory of mental development, the above 
may be stated thus: As the child matures mentally the logical organ- 
izations which adults recognize as subjects of- study emerge from a 
general, undifferentiated experience which is characteristic of early 
childhood. 


A second coincidence between the theories of Gestalt and those of 


modern education has to do with such problems as the relationship - 


of the individual to the social order. In its biological origins, this 
problem has its roots in the question of the relationship of an organism 
to its environment, and leads into the old problem of the relative 
importance of nature and nurture in the development of an organism. 
The school of educational philosophy founded by Dewey has favored 
& point of view which recognizes the unity of the individual and the 
social order; which regards biological heredity, not as a static, pre- 





Mossman, Lois C.: ‘‘Teaching and Learning in the Elementary School.” 
Houghton Mifflin Co., 1929, p. 113. 
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destined, set of limitations, but rather as a dynamic force which is 
distinctly relative to the environment in which it develops.! Progres- 
sive educators have stressed the necessity of enriching the environ- 
ment in order that the child might attain complete development. 


, Gestalt psychologists have assembled from experiments in biology and 


physiology much evidence which may be interpreted to indicate that 
inheritance is only one factor in a unified process of growth; that the 
finality with which the biological heritage fixes the growth potential 
of an organism is distinctly relative to environmental forces; that 
hereditary tendencies can be changed by environmental factors.’ 
The educational implications of this theory are that marked improve- 
ment in personality and intelligence can be produced in children if 
the environment of the early years is properly controlled. The 
function of environment is to raise the growth potential represented 
in hereditary forces, not merely to permit the expression of a static 
force; similarly, the function of education is to create intelligence 
through the provision of a stimulating environment.’ The so-called 
constancy of the IQ depends upon a constancy in the environmental 
conditions. 

The Gestalt theory denies the existence of both fixed structures 
and learned functions, if these be regarded as separate classifications.‘ 
Structures or instincts, which other schools of psychology ascribe to 
heredity, and the functioning of these, which certain psychologists 
have supposed to be adaptations to environment, are but two aspects 
of one growth potential. Those conditions of development which are 
for the time being under the control of the racial heritage are usually 
classified as heredity; those growth factors which are at present subject 
to control through a change in surrounding conditions are customarily 
classified as environment. Gestalt theory rejects such a dual classifica- 
tion and considers all factors affecting growth as phases of the same 
growth potential. Such a point of view in regard to the old problem 
of nature versus nurture seems to support the theory of progressive 
educators who hold that an improved environment can be used to 


1 Dewey, John: ‘‘ Democracy and Education.” Macmillan, 1923, pp. 86-88. 
2 Coghill, G. E.: ‘‘ Anatomy and the Problem of Behavior.” Macmillan, 1929. 
Sharp, L. W.: “Introduction to Cytology.” McGraw-Hill, 1926. It should 

be especially noted that the interpretations which Gestalt psychologists have given 
to such studies differ from the explanations given in the original investigations. 

3’Lewin, Kurt: Environmental Forces in Child Nature and Development. 
Handbook of Child Psychology (Carl Murchison, Editor) Clark University Press, 
Worcester, Mass.: 1931, pp. 94-127. 

‘Kohler, W.: ‘“‘Gestalt Psychology.” Liveright, 1929, Chapter IV. 
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remove limitations on development which are, according, to certain 
theories, largely fixed by heredity.' 

The unification of heredity and environment under a single concept, , 
growth potential, may be extended to imply that the individual and 
society, conceived in an ideal sense, are phases of one whole. An 
ideal social organization would provide the maximum of growth poten- 
tial for each of the individuals composing it. The personality of 
each individual would be completely integrated with the social whole, 
and the social order would be constantly changing so as to provide 
for completely developed individuals. This is the theory on which 
Dewey bases his concept of a democracy.? 

A third point of agreement between the philosophy of progressive 
education and the theories of Gestalt psychology is found in their 
concept of the nature of learning. The theory of learning which forms 
the basis of modern educational practice may be summarized in the <— 
following statements: First, learning takes place in response to a 
need; conscious learning is guided by the purpose or intention of the 
learner; second, learning is a creative process, depending upon inven- 
tion and discovery rather than upon mere repetition of the reaction 
to be learned; third, learning enters into life to the extent that it is 
meaningful to the learner: This implies that reactions are best learned 
when they belong to a total experience, that material should be 
learned in its proper relationships. Gestalt psychology has proposed 
hypotheses and has accumulated data which support such a theory 
of learning.* 





1 Wheeler, R. H. and F. T. Perkins: Op. cit., Chapter IX. 

* Dewey, John: Op. cit., Chapter VII. 

* Kilpatrick, W. H.: Statement of Position. The Twenty-Sizth Yearbook of the 
National Society for the Study of Education, Public School Publishing Co., Blooming- 
ton, Ill., 1928, Chapter X. 

* Adams, Donald K.: Restatement of the Problem of Learning. British Journal 
of Psychology, October, 1931. 

Adams, Donald K.: Experimental Studies of Adaptive Behavior in Cats. 
Comparative Psychology Monographs, Vol. 6, 1929, pp. 1-162. 

Birenbaum, Gita: Das Vergessen einer Vornahme. Psychologische Forschung, 
Vol. XIII, 1930, pp. 218-284. 

Schwarz, Georg: Uber Riickfalligkeit bei Umgewéhnung. 1 teil Psycho- 
logische Forschung, Vol. IX, 1927, pp. 86-158. 

Schwarz, Georg: Uber Riickfalligkeit bei Umgewéhnung. 2 teil, Psycho- 
logische Forschung, Vol. XVIII, 1933, pp. 143-190. 

Zappan, Georg: Ubarkeit verschiedener Aufgaben. Psychotechnische Zeit- 
schrift, Vol. VII, 1932, pp. 1-29. 

Lewin, Kurt: “‘ Vorsatz, Wille und Bedurfnis:” Julius Springer, Berlin, 1926. 
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The school of psychology with which we are concerned holds that 
the child’s activity has from the beginning, and throughout, a char- 
acter which may be called purposiveness. An objective statement of 
this theory is: Learning results from striving toward a goal; the end 
which is sought has two reference points, the environmental situation 
and the condition of stress within the organism. These two points of 
reference are phases of a single life situation, part of which is external 
to the organism and part of which is within it. The whole is a field 
of force, characterized by positive and negative differences in potential. 
The nature of the field forces determine the reactions of the organism: 
A positive force brings about approach and assimilation, while a 
negative force produces withdrawal and expulsion. Kurt Lewin has 
proposed the theory that learning is the result of a process which closes 
the gap in the field of force. The child, under proper environmental 
stimulation, strives continually toward desirable goals; achievement 
of an end produces a closure which relieves an internal tension. 

The problem-project method of teaching is the correlate of this 
theory of the nature of learning. Adams has defined adaptation as 
“‘something which gets done about a need.” Modern educational 
theory holds that the school should meet-the present needs of the 
child, reveal more worth-while needs, and make it possible for the 
child to satisfy them. 

The concept of closure which has been stated by Koffka is identical 
with the notion of ‘‘complacency”’ which has been proposed as the 
foundation of human behavior by Raup, a student of the Dewey- 
Kilpatrick philosophy.! He suggested that the attempt of the organ- 
ism to maintain a condition of equlibrium with the environment is 
fundamental to behavior. Raup’s discussion parallels that of the 
Gestalt psychologists in that the material used from physics, physi- 
ology, and biology is very similar. 

When learning is thought of as the result of the individual’s effort 
to find the configuration which fits a new situation, intelligence is 
given a part in the process. Intelligence functions through an insight 
into the conditions of the problem. Repetition and random trial 
and error become symptoms of learning, but not essential phases of 
the process. The learner creates a configuration which is new at 
least for him, even though it most frequently is a former discovery of 
the race. All learning becomes self-expression and has in it that 


1 Raup, R. B.: ‘‘Complacency.” Macmillan, 1925, p. 144. 
Koffka, K.: ‘‘The Growth of the Mind.”’ Harcourt, Brace: New York, 1926. 
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element of creativeness which progressive educational practice has 
emphasized in recent years. 

The Gestalt theory of learning does not deny that repetition and 
trial and error take place; it does hold that these phenomena are not 
essential to the process. Such mechanical fumbling takes place 
because the problem is beyond the present stage of the child’s develop- 
ment (maturation); because of improper pacing (i.e., adaptation of 
subject-matter to present capacity of the learner). The Gestalt 
psychologist would not deny that repetition must be frequently pro- 
vided in school situations, but he would regard the large amount of 
drill characteristic of present-day school procedure as the result of 
inaccurate perception on the part of students of the configuration 
which fits the situation presented. This point of view is supported 
by the experience of teachers, who have noted that intelligent children 
often learn. without doing the same thing repeatedly and without 
making wrong reactions. 

The theory that learning is creative does not mean that the child 
must invent anew the things that the race has previously learned; it 
does not imply that the learner can not make use of guidance from 
the teacher and imitate the configuration which has been discovered 
by the race. The child who is learning to write makes use of the 
letter forms and the hand-writing movement which exist in his environ- 
ment; however, subjectively considered, the process of learning to 
write is one of invention, discovery, and self-expression for any par- 
ticular child. Learning to write is “‘the gradual adjustment of a 
movement to a gradually developing perception of form’? where the 
motor and sensory processes constitute a single, unified configuration. 
The teacher can help by showing the pupil how to do it, but each 
individual child must create the configuration for himself. Of course, 
there are types of learning which involve much more of creativeness 
than does hand-writing, but the difference is one of degree rather than 
of kind. 

As a corollary of the theory that learning involves the development 
of a configuration, it follows that materials should be presented for 
learning in their proper relationships. The largest Gestalt which the 
child is able, at a given time, to assimilate is the best learning material 
for that child. Gestalted experiences produce better learning than 
mere collections of isolated elements. Numerous experiments have 
indicated that, in memorizing material, the whole method is superior 
to the part method. This may be explained as follows: When the 
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subject memorizes by studying the whole selection, the details are 
organized into a total configuration, each element in its proper rela- 
tionship; the subject who uses the part method learns a jumble of 
details which have no organization. 

Adults can best organize the material which they learn into logical 
subjects of study, but such an organization is understandable only 
because the various subjects have become differentiated out of the 
total experience of the learner. Material for assimilation by young 
children should be presented in its relationship to child experience, 
which is much more unified and less differentiated than that of adults.'! 
Gestalted experience for young children consists of activity units 
which cut across the lines of school subjects; a mature student can 
obtain an organized experience from the detailed treatment of a single 
subject because his total experience has become more differentiated; 
he can see the manifold relationships between the subject which he is 
studying and other phases of his life experience. The little child 
working on a project and the mature student organizing data on a 
problem dealing mainly with material from a single school subject— 
each can be obtaining a gestalted experience, each may be learning 
material in its proper relationships. 

Goodwin Watson in a recent article pointed out the importance of 
gestalted experience in education.” He stated that the advantages of 
a configuration over a mere collection of elements are that the former 
is easier to recall, provides the possibility for intelligent action within 
it, and gives meaning to its parts. Watson’s concept of gestalted 
experience is similar to the notion of organization which educators, 
since the time of the Herbartians, have held to be an essential quality 
of educative subject-matter. 

The fourth point at which the hypotheses of Gestalt psychology 
support the theories of modern education relates to the socially 
integrated personality as the end of education. Psychological theory 
which sees the personality as an organized, developing whole has 
produced much more fruitful practice in the field of personality train- 
ing than has the point of view that the character of an individual can 
be analyzed into various distinct bundles of traits like initiative, 





1 Quotations from John Dewey. The Twenty-Sizth Yearbook of the National 
Society for theStudy of Education, Chapter XII. 

2 Watson, Goodwin: Wholes and Parts in Education. Teachers College Record, 
November, 1932. 
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cooperation, courtesy, honesty, and self-control. Psychoanalysis 
has been helpful in correcting personality defects because it gives the 
patient a working picture of the total organization of his personality. 
Goodwin Watson! cites a study by Dr. Helmut von Bracken which 
illustrates the value of studying the personality as an organized whole, 
rather than as a collection of parts: 


He (Dr. von Bracken) collected a number of trait ratings (piece by piece) and 
short pen portraits (gestalted) of the same children. He found, in the first place, 
that the personality sketches informally made could be matched with the children, 
while the sum or profile of ratings, although containing many more facts, could not 
be successfully matched. The most interesting part of this experiment consisted in 
asking questions the answers to which were not given either in the ratings or in 
the case descriptions. He found that those who had read the pen sketches were 
markedly more successful in getting ‘‘hunches”’ about other characteristics of the 
children than were those who had gone over the rating scales. 


Gestalt psychology has proposed a number of laws which control 
the development of personality.2, The theory underlying these laws 
is that the personality, a psychological organism, develops according 
to the same principles as does the physiological body. One of the 
most important of these principles is that the personality develops 
from a simple, relatively undifferentiated structure at birth into a 
complex, differentiated behavior-pattern; and that specific traits 
develop as the whole growing structure strives toward balance, 
harmony and order. Defects come into existence as the individual, 
in conflict with the repressions of the social environment, fights to 
maintain the unity of his personality. The undesirable quirks which 
certain people develop result from wrong methods of maintaining 
unity of personality under conditions of excessive stress. Modern 
education has been much concerned with the problems of character 
training and mental hygiene. The concept of personality as a dif- 
ferentiating, expanding, structured organization offers a promising 
basis for future study of such problems. 

This article has not undertaken to present the experimental 
evidence for either the educational or the psychological theories which 
are treated. The literature is now available, much of it in the English 
language, and the reader who is interested can delve into it. The 
writer has merely pointed out a number of parallels between a phi- 





1 Ibid, 
* Wheeler, R. H. and F. T. Perkins: Op. cit., pp. 221-227. 
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losophy of education and the theories of Gestalt psychology. These 
theories seem to offer an experimental basis for some of the educational 
practice which is no longer new. Without neglecting the contribu- 
tions of other schools of psychology, teachers should investigate the 
findings and theories of Gestalt. It is probable that the educational 
psychology of the future will find much of value in this point of view, 
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AN EXPERIMENTAL STUDY OF AFFECTIVE FACTORS 
IN LEARNING* 


H. D. CARTER, H. E. JONES, AND N. W. SHOCK 


Institute of Child Welfare, University of California 


In his studies of emotion, published in 1921, Whately Smith® 
concluded that positively-toned, pleasant words are more easily 
learned and retained than negatively-toned, unpleasant words. The 
galvanometric technique was employed as a measure of quantitative 
aspects of the emotional reaction to words. In 1929 Jones® per- 
formed a similar experiment, but with a number of changes in tech- 
nique, particularly in the method of securing learning data. The 
results were in support of Smith’s findings, and certain suggestions 
were made concerning procedures for further experiments. 

For the words in Jones’ list, Lynch’ correlated learning values, 
obtained from a new group of subjects, with the galvanometric 
measures previously reported. A substantial relationship was found, 
of the order of .60. Measures of retention indicated that after two or 
three weeks had elapsed the unpleasant words were inferior to the 
indifferent words in survival value. This difference was not apparent 
in measures of immediate recall. 

In the further study by Stagner’ negative results were obtained, 
and the conclusion was offered that galvanometer deflections to words 
are not important in relation to memory; it is possible, however, that 
the discrepancies found were influenced by sampling differences, 
since Stagner used Smith’s galvanometric data without attempting to 
obtain instrumental records from his own subjects. 

Balken! has also reported a lack of relationship between affective 
tone, galvanometer deflections, and efficiency of learning; an examina- 
tion of Balken’s procedure indicates a fundamental error in that the 
word stimuli were given at too rapid a rate to permit a galvanic 
reaction and recovery in time for the next stimulus; stimuli were 
presented at five-second intervals. It is impossible to judge the 
adequacy of Balken’s word lists, as the words are not given in the 
report, and no satisfactory evidence of their adequacy is presented. 





* The writers are indebted to Dr. H. S. Conrad and Dr. B. S. Burks, who read 
and criticised the manuscript. 
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Most persons would agree that statistically reliable learning data 
can be secured, and the results of several investigators* have shown 
that high reliability can be attained with respect to galvanometric 
measures; however, these two fundamental requirements can be 
satisfied only with difficulty. A survey of the evidence indicates that 
those who have denied a relationship between efficiency of learning, 
and affective and galvanic factors, base their contentions upon inade- 
quately reliable evidence; furthermore, most of these investigators 
have not secured from the same group of subjects the various types of 
data needed for the argument. Hence, it is not so much the need for 
merely establishing the fact of some relationship, as the desire to 
measure it more accurately, and study its conditions, which has led 
to the present experiment. 


PROCEDURE 


The subjects in the present experiment are one hundred two sixth 
and seventh grade children from the public schools of Oakland, Cali- 
fornia. Equal numbers of boys and girls are included in the group. 
These children were brought into the laboratory for the learning 
experiments, galvanometer studies, and certain other procedures, on 
each of two afternoons and on two mornings about a week later. 

The several steps in the project may be considered under the 
following headings: (1) Experimental selection of test materials; (2) 
evaluation of the materials as emotional stimuli; (3) an association 
experiment, with measurement of galvanic responses; (4) a learning 
experiment, with immediate and delayed recall. 

1. The Test Materials.—In the choice of test content, the decision 
was reached to-use single words rather than more complex organiza- 
tions of material. This was partly because of greater convenience in 
experimental control, and partly because it seemed desirable to estab- 
lish definite conclusions at this level before undertaking more ambitious 
projects. A series of three hundred fifty words was presented to 
twenty adults and twenty children ranging in age from nine to sixteen 
years, with the request that they classify each word as ‘‘ pleasant,” 
“indifferent,” or “unpleasant.” They were also asked to suggest 
additional words which they regarded as especially good examples 
of any of the three types. In this way, successive lists were prepared 





* See references 2, 3, 6, and 11 in the bibliography. 
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and tested out, until data from twenty adults and twenty children 
(not always the same twenty for all word lists) had been obtained for 
a total of over one thousand words. Of these, words were retained 
for further experimentation only if their meaning was known to all 
of the subjects (thus eliminating difficult words) and only if at least 
seventy-five per cent of the subjects agreed in classifying the words as 
pleasant, indifferent, or unpleasant. 

2. The Rating of the Emotional Values of Words.—From one hun- 
dred two children, including those who were to serve as subjects in 
the learning experiment, ratings as to the pleasantness and unpleas- 
antness of the words were obtained.* The words were typewritten 
upon cards, which the subjects sorted out into five classes, class one 
including only very pleasant words, class five only very unpleasant 
words, etc. The mean ratings obtained for each word are shown in 
Table I. It is apparent that the three lists of words finally selected 
fit rather well into the three classes desired ; the ranges of mean ratings 
are sharply differentiated on the five-point scale. 

3. Association Tests and Instrumental Measurements.—In this 
part of the experiment the subjects, while lying on a couch, were 
given a free association test in which records were made of association 
times, word responses, and galvanometer deflections to the stimulus 
words. The deflections were measured with a pointer galvanometer 
whose sensitivity was of the order of two microamperes per millimeter. 

The apparent skin resistance of the subject was measured by the 
substitution technique, introduced into galvanometric method by 
Darrow,‘ with the bridge balanced at one hundred thousand or 
fifty thousand ohms and with an external current of eight and 
four volts respectively. The galvanometer deflections, as well as 
bed movement, respiration, blood pressure, and pulse rate, were 
recorded photographically on bromide paper five inches in width. 
The galvanometer deflections were expressed in terms of ohms change 
by calibrating the instrument as suggested by Darrow.‘ By using 
the substitution method, the current through the subject is constant 





*In explaining the failure of certain workers to obtain positive results, 
Meltzer, ‘*) Stagner,“'® and others have pointed out the desirability of having the 
words rated by the persons who serve in the learning experiment. This refinement 
of technique is desirable, especially when words are used which elicit widely variable 
reactions from different persons. Only average ratings are used in the present 
report, but the individual persons’ ratings are available for use in the further 
analysis of the data. 
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TaBLeE I.—MeEaAn RatTInGs OF PLEASANTNESS-UNPLEASANTNESS, MEAN 
GALVANOMETRIC DEFLECTIONS, MEAN LEARNING SCORES, AND 
EMOTIONAL INDICES FOR Eacu OF THE Worps* 
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; Mean Emo- oi Mean 
i P-U tional | 8*'V9"° | earning 
rating index mpneee score 
4 deflection 
) **Pleasant’”’ words. 
ga ea id i ae a BK GS 1.17 1.83 457 4.10 
SF Shas vse veeulWhwds <b4 1.18 1.82 485 3.73 
a MN ss hk duo baer one ole 1.59 1.41 490 4.00 
Ss cai vatweise daa einbas «os 1.69 1.31 409 3.59 
EASA ERR Fae eRe 1.78 1.22 751 3.79 
AREF pel RS a I fh 1.93 1.07 483 2.94 
i Rid claude us as 0 o:508 6 oe 1.94 1.06 466 3.75 
aA ChE esbeuweemtaures 2.03 .97 562 3.80 
j 
i acd casas css Saban men 1.66 1.34 513 3.71 
{ **Indifferent”’ words. 
i A Rene CT RS Se Alc A eee 2.51 .49 334 3.62 
f ARE ES aes Rape i 2.68 . .82 358 3.07 
I ile Cust eck ag Bald enw 2.72 .28 362 3.03 
ea ae ae Se hs ad 2.72 .28 266 3.03 
EE SS yey eee 2.74 . 26 332 2.89 
Bh NE osha ti hur Piedad. nie tee co-h ik 00 2.77 .23 287 2.66 
oi at, (52 eee eS 2.80 .20 398 2.95 
ee Nak oN bak lsd e rok es 2.83 17 387 3.24 
) Ee REN REN ue pean 2.72 .28 340 3.06 
t | *‘Unpleasant”’ words 
‘i APSR AS Sapee ta, 4.17 1.17 489 3.27 
hn as 16 pbe oa kinlee vehaleude 4.17 1.17 547 3.50 
Bk he ckn b4s dhe thee 4.19 1.19 534 3.39 
Das» dicadninkiceld Sanh eiccaten 4.25 1.25 528 3.34 
; er ee Pee eee 4.28 1.28 624 3.35 
inne ti wan 6406 o uae’ 4.48 1.48 642 3.60 
NN is ss a de el ecatell 4.54 1.54 542 3.49 
PADIS «ssid Gals 0 0% 0 RAD 4.66 1.66 658 3.15 
ey i ks chi eesti 4.34 1.34 570 3.39 
Ee * The number of children whose scores were averaged to determine these means 
: a are: one hundred two for mean P-U ratings and Emotional Index; fifty-three for 
. mean galvanometric deflections; and ninety-five for mean learning scores. The 
ba units of measurement are as stated in the text. 
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when the bridge is balanced, irrespective of fluctuations in his apparent 
resistance; all deflections can then be read in terms of standard units 
(ohms), permitting direct comparisons between deflections obtained 
for different subjects. 

The electrodes used consisted of zinc plates covered with kaolin 
made into a paste with saturated solution of zinc sulphate. The 
layer of paste was covered with a cotton pad soaked in a nine per cent 
sodium chloride solution; the purpose of this was to make contact 
with the skin and to protect the skin from possible irritation by the 
zinc sulphate. With the small current used (about 0.08 milliamperes) 
polarization of the electrodes is minimized. One electrode was placed 
in the palm and the other electrode on the under side of the fore-arm. 
The electrodes were constructed to maintain, so far ‘as possible, con- 
stant conditions with respect to area, pressure, and moisture. * 

Figure 1 illustrates a standard obtained by balancing the bridge 
at one hundred thousand ohms and introducing changes of one thou- 
sand ohms in the same arm of the bridge in which the subject is con- 
nected. Figures 2, 3, and 4 are illustrations of the galvanometer 
deflections obtained from different subjects with varying degrees of 
reactivity. The vertical lines indicate one-second intervals. The 
curves are to be read from right to left; a falling curve denotes a 
decrease in apparent skin resistance. 

The association words were divided into two comparable lists, 
with the P, J, and U words distributed evenly throughout the lists; 
the effects of position in series were eliminated by systematic variation 
of the order of the words for different subjects. One list was given 
in an afternoon session, and the other on a morning about a week 
later. Half the subjects had list one in the afternoon and list two in 
the morning; for the others this procedure was reversed. The experi- 
mental list of words was always preceded and followed by three buffer 
words not included in the analysis of results. The words were pre- 
sented verbally, one every twenty seconds.t For each person, the 
magnitude of deflections was measured separately for each word, and 
composite scores were found for the P (pleasant,) J (indifferent), and U 
(unpleasant), groups of words. The correlations of scores of indi- 


* A study of the electrical characteristics of the electrode system is being made. 
+ This was the standard interval; occasionally, however, a longer interval 
was required, in order that the experimenter might make necessary adjustments of 
apparatus. 
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Fig. 4 
Fics. 1-4.—g indicates the galvanometer deflections; m the movements of the 


subject on the couch; b the breathing record; ¢ the time-marking line (a deflection in the 
line marked t indicates the moment at which the word-stimulus was given). 
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viduals, obtained on morning and on afternoon records, are shown in 
Table IV; since the measures obtained by use of the galvanometer 
in the afternoons are not as reliable (because of restricted size of 
deflections) as those obtained in the morning, these coefficients are 
unduly low as measures of the reliability of the scores used in this 
report; however, they indicate that even results obtained on different 
occasions show considerable consistency. The split-half reliability 
ceofiicients are also given in Table IV. 

4, The Tests of Learning and Recall—The same subjects were 
tested by a paired associates technique both for the learning and the 
retention of the words. Here also the two lists were used, so that 
in the afternoon the subjects learned the half not used in the galva- 
nometer experiment on that day, and in a morning in the following 
week learned the other list. The task was to recall the appropriate 
word when a picture was presented. No arbitrary criterion of perfect 
learning was adopted; each individual was given five trials (prompting 
method) using each list, and the number of correct recalls was taken 
as the measure of learning. The pictures and words were shown 
together for five seconds on the first exposure; in the succeeding five 
learning trials the subjects were allowed five seconds in which to recall 
the words in response to the pictures; in case of failure, the words were 
again shown for five seconds, but when a correct response occurred the 
experimenter passed on to the next picture at once. From the results 
secured in this way, for each individual learning scores were obtained 
for each word separately, and a composite score was determined for 
the P, J, and U groups of words. By systematic alteration of the 
order of presentation, possible effects of position in the series were 
eliminated from the group results. 


RESULTS 


For each word in the list, four types of score are given in Table I.* 
The mean rating of pleasantness-unpleasantness is based upon ratings 
by one hundred two subjects (forty-eight girls and fifty-four boys). 
The mean learning score is based upon data from ninety-five of these 
subjects (forty-seven girls and forty-eight boys). The mean galva- 





* The groups upon which these measures are based include both boys and girls. 
Study of these groups separately showed that ratings, learning data, and galvano- 
metric deflections for boys and girls agreed to such an extent that analysis of the 
results for sexes separately is not profitable. 
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nometric deflection is based upon results from fifty-three of these 
children (twenty-eight girls and twenty-five boys).* From the 


Tasie II.—RELIABILITY OF THE MEASURES SHOWN IN TABLE I* 








Py I Pil 

277 
IE DENIES ooo din ccc cincicduiccvivewseecescs .95 .97 
Galvanometric deflections. ...................20.00. .66 .79 
ee es sa apne een neeek et suns .79 .88 
I SSE ee i oe eda tees .92 .96 











* Order of the twenty-four words according to each of the four criteria was 
determined for each half of the group; p, , is the correlation of the two rankings so 
2TT 
determined. Correction by the Spearman-Brown formula shows the reliabilities 

of the measures determined for the total groups (p17). 


mean rating for pleasantness-unpleasantness, the Emotional Index 
(E.I.) has been computed by calculating the deviation of the mean 


Taste III].—CoRRELATIONS BETWEEN ORDERS OF WorpDs AS DETERMINED BY 
THE SEVERAL TypEs oF AVERAGE ScorE* 














p cor- 
p rected for 
attenuation 
Pleasantness with ease of learning.....................20-- .40 .43 
Emotional index with size of galvanometric deflections... .... .65 .75 
Emotional index with ease of learning...................... .61 .66 
Size of galvanometric deflections with ease of learning........ .49 .59 
Length of word (number of letters) with ease of learning...... .23 





* These are essentially correlations of values assigned to the words on the basis 
of data from groups of individuals who took part in several different experiments. 
Since there are twenty-four words, n is in each case twenty-four. The correlations 
have been corrected for attenuation using the reliability coefficients shown in 
Table IT. 


rating from the rating of three, which in the rating scheme represented 
“complete indifference,’ and may hence be taken as an affective zero 
point. The E.I. provides a measure of intensity of emotional tone 
of the words, disregarding the distinction between pleasantness and 
unpleasantness. Such an index is required for determining the 





* The differences in size of sample are due to extraneous factors determining 
availability of subjects, and completeness of instrumental records. 
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relationship of galvanic measures of emotion and verbal report data; 
it constitutes a fourth score, which is of course dependent upon the 
first. 

Table II furnishes indications of the reliability of the measures 
given in Table I. These coefficients suggest that the reliability of the 
measures used is in each case high enough for comparisons between 
groups of words in each category, though not, perhaps, high enough 
for comparisons between individual words. 

Intercorrelations among these various measures are shown in 
Table III. Contrary to the assertions of some workers, it is apparent 
that measures of emotional tone of words, obtained on the galvanom- 
eter, are in agreement with similar measures based on ratings. It 
also appears that the pleasant words are learned better than the 
unpleasant or indifferent words, and that words which elicit large 
deflections of the galvanometer are better learned. The words of 
this group which are longer (contain more letters) are better learned 
than the shorter words. 


STATISTICAL SIGNIFICANCE OF THE DIFFERENCES 


The significance of the obtained differences in learning scores for 
the three categories of words may be determined by examining dis- 


Taste [V.—REwiaBILity CoEFFICIENTS* 




















Galvanometric Learning scores 
measures 
n r PE n r PE 
Odd-even halves. 
PRE. OS ake sd eueeos 53 84 .03 95 . 56 .05 
Dee. eae 53 86 .02 95 .78 .03 
rts Lo ck coed 53 84 .03 95 .68 .04 
Morning vs. afternoon. 
PCa ccc nS oy ceed teens 53 .62 .06 95 .46 .05 
Ds nics AER. . danibs 53 .52 .07 95 .66 .04 
| ae re ee 53 .42 .08 95 .79 .03 














* The correlations of individual persons’ scores on half the list with their scores 
on the other half have been corrected by the Spearman-Brown formula, to obtain 
the reliability coefficients for the total list upon which the measures used in this 
study are based. 
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TaBLeE V.—INTERCORRELATIONS BETWEEN CHILDREN’S Scores ON P, J, anv U 


A. Composite Galvanometric Measures. Fifty-three Children 


























P I U 
I Oe Ue ols bes osbeaeele et ues . 56 .82 
NS 5 i oik's cid KG Ase e halal Reeders k .66 dis‘ .69 
TESS Oe a Pane .98 .81 

B. Composite Learning Scores. Ninety-five Children 

P I U 
Ee etc ie he ee ee te .68 .63 
al a Si i i ae pe es .74 
ee ee ee ee gh ween 1.021 1.016 














* The upper half of each table shows the obtained correlations, and the lower 
half shows the correlations corrected for attenuation, using the reliability coeffi- 


cients given in the upper half of Table IV. 


TaBLE VI.—Grovurp DIFFERENCES 


















































Learning scores* Galvanometric deflections* 
n Mean SD n Mean SD 
ne a's a an a 95 29.86 4.73 53 4.25 2.97 
a 95 24.56 7.51 53 2.84 2.02 
ion wale aadw 95 27.35 5.46 53 4.56 2.94 
. Diff. 
Diff. SDpitt. SDow 
Learning scores 
IY RL SRR a a REN te 2 ae 5.30 . 566 9.36 
RN Se ee ee a 2.51 .443 5.66 
EERE RR ee aimee 2.79 .498 5.60 
Galvanometric deflections. 
SBR EEE pepe: erp © 1.41 341 4.13 
OT RE ern ae 31 . 244 1.27 
eg ea era 1.72 . 292 5.89 





* The learning and galvanometric measures are summations for the eight words 
in each category. Learning scores are given in terms of number of correct recalls. 
The mean galvanometric deflections given in this table are in thousand-ohm units. 
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tributions of scores of individual persons. It is for this reason that 
the P, J, and U scores for individuals were computed. The reliability 
coefficients presented in Table IV show that these scores are sufficiently 
reliable for use in discovering group differences. Since a person’s P, 
I, and U scores are correlated, the intercorrelations shown in Table 
V are needed for computing the significance of the differences found 
between words of the three categories. Table VI shows the means 
and standard deviations of learning scores and galvanometer measures 
for the different types of words. These learning scores, it should be 
remembered, are summations of raw scores, using number of correct 
recalls as the measure. The galvanometer measures are summed raw 
deflections recorded in ohms. A reliable difference is found between 
the average deflection for the indifferent words, and the average 
deflection for either the pleasant or the unpleasant words. P words 
are learned best, U words next, and J words least well. 


DISCUSSION 


The treatment of data in this report assumes that words rated as 
P, U, and I by the majority of persons really are P, U, and J respec- 
tively. If some individual raters disagree on certain words, the effect 
will be to obscure differences which may exist between learning scores 
for P, J, and U words. Hence any found differences may be regarded 
as all the more significant if such errors have entered into the data. 
The procedure here used is regarded as appropriate in view of the 
method of selecting and evaluating the stimulus materials. Further 
analysis of this problem is reserved for a later paper. 

Another factor to be considered is the effect of familiarity of the 
words. Stagner™ found a correlation of .16 between memory value 
of his words and frequency of use of the words (as determined from 
Thorndike’s Teacher’s Word Book). From this he concluded that 
frequency of use is not an important factor. He also found a correla- 
tion of —.224 between memory value and size (number of letters) 
of the words in his sampling of words.* But in Stagner’s own data the 
correlation between memory value and indices of pleasantness- 
unpleasantness was .375. The obvious conclusion is that the effect 





* This correlation is positive .23 for the data reported in the present paper (see 
Table III). It is likely that method of learning, as well as sampling of words, 
affects the size of this correlation. 
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; of emotional tone upon memory value cannot be explained away in 
a terms of familiarity of the words. 


4 As has been pointed out above, the negative conclusions of a 
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number of investigators, concerning the relation of galvanometric 
deflections to memory values, may be doubted because of the inherent 
difficulties in securing adequate data, and the lack of evidence that 
: a, such difficulties had been overcome. Determination of the exact 
: i degree of the relationship is a still more complex problem. The 
correlation found in the present investigation is .49; this correlation 
may be expected to vary with the reliability of the measures obtained, 
and with the conditions of experimentation. * " 

A word of caution may be in order concerning interpretation of the I 
correlations in Table III. The reliability coefficients in Table II f 
should rise to unity when the populations are infinitely large, and 
this should cause the intercorrelations in Table III to rise, approxi- 
mately to the extent indicated by the correction for attenuation. 
Those intercorrelations should approach as limits whatever values 
measure the true relationship. Only by further experimentation 
can one determine the values such correlations would take when 
different samplings of words are used as stimuli. 
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SUMMARY AND CONCLUSIONS 
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On the basis of previously obtained data, three lists of words were 
| selected which subjects fairly consistently rated as pleasant (P), 
He ee indifferent (J), or unpleasant (U), respectively. Subsequent ratings 
it i of pleasantness-unpleasantness by one hundred two sixth and seventh 
| grade children showed that the words so chosen were reliably clas- 
ae sified, and the average ratings for the three lists of words showed no 
f overlapping. Experiments using these lists of words as test material 
gave the following results: , 
a 1. Ease of learning correlated .40 with estimated pleasantness, 
, .49 with galvanometer deflections, and .65 with the Emotional Index 
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id * In the further work which is being performed, a wider selection of words is 

a being used, and increased reliability is being obtained in the measures of learning 

| and of galvanometer deflections. The data which will be available later will make 1 
4 possible more detailed treatment of a number of factors which add to the com- 

plexity of the general problem. Study of sex differences, and study of repression, 1 
; for example, may profitably be postponed until further data have been collected. 
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(a measure of the deviation of the words from indifference, distegard- 
ing sign). These are correlations of rank orders based on data from 
groups of persons; the present interest is in the relative position of 
individual words, not of individual persons. The correlations are, 
essentially, correlations of scale values for stimuli. These results 
indicate that there are, on the average, definite relaticaships between 
emotional factors and ease of learning, when suitable stimulus mate- 
rials are used. 

2. Words which are pleasant or unpleasant tend to elicit larger 
deflections of the galvanometer than words which are indifferent. 
The composite deflection-score for the words in the J category is 
reliably different from the P and U composites, in terms of group 
averages, 

3. Learning scores of individuals tend to be highest for pleasant 
words, next for unpleasant words, and lowest for indifferent words. 
Statistically reliable group differences, in composite learning scores, 
are found for the words in the P, J, and U categories. 
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EQUATING TEST SCORES 


HERBERT MOORE AND HELEN TRAFTON! 
Mount Holyoke College 


In common with most Colleges, the Board of Admission at Mount 
Holyoke has for some time been puzzling over the significance of IQ’s 
(1) in relation to academic promise (2) in relation to one another 
when different candidates offer different 1Q’s in different tests. In 
order to discover their relative and predictive value, the Psychology 
Department was asked to give the four most commonly used tests to 
the Freshman class, and correlate the results with the two forms of 
the Scholastic aptitude test and the semester grades during their eight 
terms in College and, by means of some reliable method, to equate the 
scores on one test with equivalent raw scores on the others. The 
results of the correlations of the tests with one another and with 
the first semester ranks have been computed, on the basis this pre- 
liminary report is being made. 








TaBLeE I 
Test Score range Median 
Ee ee eee Pee ee 191 
I arias 5d bw ah had 6.alt 0a bd» & Gee 167 
ST a ee 101 
NS a, a ee, ee Kk 67 
Stanford-Binet IQ.......................] O@—-122 110 
Scholastic aptitude V....................| 338-768 535 
Scholastic aptitude M....................| 300-739 515 
First semester rank......................| 47-— 96 percent 76 











The tests used in this study were: Terman, Form A; Otis Self- 
Administering, Form A; Army Alpha, Form 8; and Miller, Form A. 
They were given on four successive Friday afternoons during Novem- 
ber to a Freshman class of two hundred thirty-five. In addition to 
the Group Tests, the Stanford-Binet was given to one hundred forty- 
five members of the same class by Senior students who had one 
semester’s training in giving the Stanford-Binet scale. It is recognized 
that the results from the individual test are unreliable because of the 





1 Elizabeth Bail and Lillian Beaupain helped in correcting the tests and working 
out the correlations. 
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lack of experience of the testers and because of the ease of the Stan- 
ford-Binet scale in the upper ages. However the correlations may 
have value. 

The raw score range and median for the different tests are given in 
Table I. 

The correlations between the different tests are given in Table II. 








Tas_e II 

Test 1 2 3 4 5 6 7 8 
1. Aen BIPBR... cc eee eed ces | eT ST eT Se ST eee ee 
/)_. een ce rae fe ee te le 
fo ere Je Se Fe | ell 
G, Tio nic nck Wee ca cece cp Lae E ae bo ocd 8 aa, ee Lee ee 
B, NE ic ccn ccc nc ccck pee | eT aE eT SC... ee ee ee 
6. Schoslatic aptitude M...... .04 | .16| .26) .26| .32/] ... | .09] .13 
7. Scholastic aptitude V....... .60 | .59 | .36| .62| .48 | 09] ... | .30 
8. First semester rank......... .19 | .35 | .55 | .37] .19 | .13 | .30 





























The probable errors ranged between .02 and .06. 

The correlations are similar to those found by others who have 
given the same tests to the same group, but for the strikingly low 
correlation between Scholastic Aptitude M and First Semester Rank 
and the others. The lack of correlation between Scholastic Aptitude 
M and V is common knowledge. The discrepancy between First 
Semester Rank and all the others, with the exception of Army Alpha, 
may be due to teachers’ grading discrepancies, students’ adjustment 
problems, and the duplication during the first semester of work 
already done in preparatory schools. 

The lack of very high correlation between the different tests is 
shown clearly if the results are divided into quartiles. On the basis of 
these results one seems justified in drawing the conclusion that the 
score in any one test gives very little indication of academic promise 
or score in any other test. 





TaB_e III 
Decile difference............ 0 1 2 3 4 5 6 
Number of cases............| 48 53 45 51 20 12 6 
Percentage of class..........| 20 23 20 22 8 5 2 























More satisfactory results however are secured if the raw scores are 
divided on a percentile basis, and the average percentile rank from 
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the different tests compared with the percentile rank in academic 
work. This was done with each member of the class, and the number 


RELATIVE VALUES OF Six INTELLIGENCE TESTS 








Miller Army Otis Terman en ~ “ate Sohainstie Apti- 
120 195-212 75 217 717 666 
119 194 75 215 707 658 
118 192 75 214 698 650 
117 191 75 213 689 643 
116 190 75 211 680 635 
115 188 74 210 671 627 
114 186 74 209 661 620 
113 185 73 208 652 611 
112 183 73 207 644 605 
111 182 73 205 634 597 
110 181 72 204 625 590 
109 179 72 203 616 582 
108 178 71 202 608 575 
107 176 71 201 599 567 
106 175 70 199 589 559 
105 173 70 198 580 552 
104 172 69 197 571 544 
103 170 69 196 562 536 
102 169 68 194 553 529 
101 168 68 193 544 522 
100 166 67 192 535 514 

99 164 66 191 526 506 
98 163 66 190 517 499 
97 162 65 188 508 491 
96 160 65 187 499 484 
95 159 64 186 490 476 
94 157 4 185 481 469 
93 156 63 183 - 472 461 
92 154 63 182 462 453 
91 153 62 181 454 446 
90 151 62 179 442 436 
89 150 61 178 435 430 
88 148 61 177 426 423 
87 147 60 176 417 415 
86 145 60 175 408 408 
85 144 59 174 399 401 
84 142 58 172 390 393 
83 141 58 171 381 385 
82 140 57 170 372 378 
81 138 57 169 365 372 
80 137 56 167 354 363 
79 135 56 166 345 355 
78 134 55 165 336 347 
77 132 55 164 326 340 
76 131 54 162 318 332 
75 129 54 161 308 325 
74 128 53 160 299 317 
73 126 53 159 291 310 
72 125 52 158 281 302 
71 123 52 156 272 294 
70 122 51 155 263 286 
69 120 51 154 254 279 
68 119 50 152 244 271 
67 117 49 151 235 264 
66 116 49 150 227 257 
65 115 48 149 218 249 
64 113 48 148 208 241 
63 112 47 146 200 234 




















and percentage found of those who fell in the same or different deciles 
in both cases. Table III gives the results. 

These results seem to indicate that the average percentile rank 
on a number of tests correlates highly with academic position, although 
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the percentage of differences is sufficiently high to indicate a wide 
difference between students’ abilities, as the tests test them, and 
academic accomplishment. Teachers’ marks are, of course, not 
infallible, and these may be responsible for some of the discrepancies. 

The relative values of the different tests are given in Table IV. 
These have been computed by means of the Standard Measure Method 
suggested by Kelley,’ and in general agree with the comparative 
values of the different tests given by Runnels.’ 


Katey 2) “Statistical Method.” P. 114. 
2 Runnels, R. O.: “Manuel for Determining the Equivalence of Mental Ages 
obtained from Group Intelligence Tests. Pp. 10, 11. 























SE 


a EAC Te 7,» 
ag 


SS SS A eo 
eel 


~D) CR te Aa i <s% 
: ee Zo ¥, 


Etta af OLE LEE GE LEE 


EE a POs Sibmer 85 
mee 
z 


BE RY Fas RE TOT TSS 





Siig 6D. Se SRTEX 


Sa ee eee ts vee 


ogee. SERRE Sarl 2 


sia pe 2b 2g 


Bes fsa SO er SSeS 





THE PREDICTION OF SOME MEASURES OF 
VOCATIONAL ADJUSTMENT ON THE BASIS OF 
TESTS GIVEN EIGHT YEARS BEFORE AND OF 
THE SAME TESTS GIVEN TWO YEARS AFTER 

THE FACT PREDICTED! 


IRVING LORGE AND ZAIDA F. METCALFE 


With the assistance of The Staff of the Division of Psychology of the Institute of 
Educational Research, Teachers College, Columbia University 


The guidance expert is faced with the problem of predicting the 
status of individuals at some future time. The status being predicted 
is some function of traits and abilities objectively measured at the 
time of prediction. Guidance implicitly has assumed that the cor- 
relation of objective measures taken at some future time would be as 
high as the correlation of the objective measurements taken simul- 
taneously as the measurement of the future status. Guidance has 
faith that a test or battery of tests of traits or abilities at age fourteen 
would predict vocational success at age twenty as well as such tests at 
age twenty. | 

As part of a study of the prediction of vocational success? retests 
are available on some one hundred forty boys. In 1921-1922 two 
groups of boys were tested with various tests of intelligence, clerical 
ability and mechanical adroitness. The boys age group? was com- 
posed of all pupils aged 12.0 to 15.0 in an elementary school in a 
neighborhood of low economic status, and of South European stocks. 
The boys grade group? was composed of a fairly representative sam- 
pling of the second term of the eighth grade pupils in Manhattan. 
In 1932-1933 some one hundred forty of the original groups were 
retested with some of the same tests that they had taken over ten 
years before. By means of the reexaminations, the question of the 
relative value of long or short time prediction can be evaluated. 

As was pointed out in a previous article,’ the boys who were 
retested could not be considered to be sampled from a population 
different from that of which it was a part in 1921-1922. Nevertheless, 
in all the comparisons, which are made, the coefficients are recalculated 
using only the individuals of the retest population. 





1 This investigation is part of a study made possible by grants from the Common- 
wealth Fund and the Carnegie Corporation. 


2? Thorndike et. al.: ‘Prediction of Vocational Success.’”’ Commonwealth 
Fund. 1934. 


* Lorge et al.: ‘‘Retests after ten years.”” The journal pp. 136-141. 
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The boys age group was tested with the following tests among 
others in 1921-1922: 

1. Thorndike McCall Reading Scale. 

2. I. E. R. Arithmetic. 

3. Stenquist Assembly Test. 

4. I. E. R. General Clerical Test. 

The boys age group was tested with the same test and in addition, 
height and weight were taken. In 1932-1933, the same tests were 
given to all the volunteers who came to Teachers College for retest. 

The correlations between scores on the various tests over a ten 
years interval between administration of the tests were as follows: 


Thorndike McCall reading scale................... .57 n = 163 
ee. 06544666 6d 4 seb encobeecuns .60 = 164 
ey ss cost csdasesseastes .66 n = 163 
I. BE. R. general clerical test... . 1... ccc ccc cceee .63 n = 160 
EL, 50s ba odin ahs SGU RUSS ee Kaede eee A7 n = 132 
. Se re rere. fe yr ie. eae .63 n = 132 


We are estimating long and short time prediction against two 
criteria: 

Criterion twenty-two. Earnings per year at age 20.0 to 22.0, cor- 
rected for cost of living, value of the dollar, and degree of employment. 
Criterion twenty-four. Average liking for job, age 20.0 to 22.0. 

The details for the derivation are given elsewhere.' 

The correlation coefficients between the criteria and the various 
tests at the two intervals were computed separately for the boys grade 
group and for the boys age group. The coefficients between criterion 
twenty-two, salary per year for age 20.0 to 22.0 and the following 
tests is for the 











Boys grade group 
Test in | Retest in 
1921-1922) 1932-1933 
I os Ak US ves cduee ce wescees .10 —.01 | n = 106 
. . Vecutwae beewaeee ess 24 .23 | n = 106 
Cnn vaeaebeaicn 15 13 | n = 106 
ne .05 .09 | n = 106 
Thorndike McCall and I. E. R. arithmetic. ...... 14 .07 | n = 106 
is ak Lagedsaeieaneanes .30 .25 |n= 86 
kk ie Be eke nns ll .29 |n= 86 

















‘Thorndike et al.: “Prediction of Vocational Success.’””’ Commonwealth 
Fund. 1934. ’ 
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For the boys age group, the correlations between criterion twenty- 


two and the various tests are 











EE Bs ca ba ic ccc ecw ecices 
Stenquist assembly..................... 
SE re See 
I. E. R. general clerical................. 
Thorndike McCall and I. E. R. arithmetic 





Boys age group 
Test in | Retest in 
1921-—1922)1932—1933 
— .06 .04 | n = 34 
.02 —.06 | n=34 
.06 .24 | n = 34 
—.15 02 | n = 34 
—.01 13 | n = 34 
13 .08 | n = 24 
.12 —.01 | n = 24 











The correlations with criterion twenty-four, average liking for 
job at age 20.0 to 22.0 with the various tests is for the 











hs in ss side rab acd gekeceeakee does smashed 
as bates ne edind sb daeensennensereone 
Re ns teh ec haha ce Keende sue 
Ns ccc anocetsnsceucsceoccncess 


Thorndike McCall and I. E. R. arithmetic 
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Boys grade group 
Test in | Retest in 
1921—1922/1932-—1933 
.02 .02 
— .24 —.11 
— .05 — .03 
— .02 .04 
— .02 .00+ 
— .04 — .06 
— .06 .O1 





The correlations with criterion twenty-four are for the 
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Thorndike McCall and I. E. R. arithmetic 
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Boys age group 
Test in | Retest in 
1921-—1922/1932-1933 
— .20 — .07 
— .00+ .03 
ll .19 
— .07 .06 
— .06 .06 
.25 — .08 
.30 — .06 
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These correlations are so near zero that very little will be gained 
by transmuting them to Fisher’s z. We average the coefficients 
directly excluding the composite Thorndike-McCall and I. E. R. 
Arithmetic. The averages are without regard to sign with criterion 
twenty-two, .16 and .09 for 1921-1922 and .17 and .07}4*for 1932- 
1933 and with criterion twenty-four, .07 and .151%4 for 1921-1922 and 
0414 and .08 for 1932-1933. 

The correlations though low tend to show very little difference in 
predictability in terms of long or short times. 

If we had perfect measures of the various traits at age 14.0 and at 
age 24.0, the correlations with criterion twenty-two, salary per year 
for age 20.0 to 22.0 would be approximately 














Boys grade ___ Boys age 
Test in | Testin | Testin | Test in 
1921-1922) 1932-1933) 1921-1922) 1932-1933 
Thorndike-McCall.................. ll — .01 — .07 .05 
Stenquist assembly................. .30 .29 .03 — .08 
T, Te. Bey GRIDS on cece sk cciccsv sce 17 .15 .07 .28 
I. E. R. general clerical.............. .05 .10 —.16 .02 

















and with criterion twenty-four, average liking for job at age 20.0 to 
22.0, the correlations would approximate 














Boys grade Boys age 
Test in | Testin | Testin | Test in 
1921-1922) 1932-1933) 1921—1922/1932—1933 
Thorndike-McCall.................- .02 .02 — .23 — .08 
Stenquist assembly................. — .30 —.14 .00 .04 
Be is Bes OID, vnc ccctccescus — .06 — .03 .13 .22 
I. E. R. general clerical.............. — .02 .04 — .08 .07 














If we had perfect measures of salary from age 20.0 to 22.0, the 
correlations would approximate 

















Boys grade Boys age 
Test in | Testin | Testin | Test in 
1921-1922) 1932-1933) 1921—1922)1932-1933 
Thorndike-McCall...............+-. ll —.01 — .06 .04 
Stenquist assembly................. .25 24 02 | —.06 
I. E. R. arithmetic.................. .16 14 .06 25 
I. E. R. general clerical.............. .05 .09 — .16 .02 
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3 i 
; i and if we had perfect measures of average liking for job from age 
ia 20.0 to 22.0, the correlations would approximate 
; Boys grade Boys age 
H Test in | Testin | Testin | Test in 
| 1921—1] 922/1932—1933) 1921-1922) 1932-1933 
Thorndike-McCall.................. .03 .03 — .26 — .09 
Stenquist assembly................. — .3l —.14 — .00 .04 
I. BR. arithmotic.................. — .06 — .04 14 25 
I. E. R. general clerical.............. — .03 .05 — .09 .08 
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The tests at age 14.0 and at age 24.0 make for equally poor predic- 
tions of measures of vocational success at age 20.0 to 22.0. Even if 
the tests were perfect measures of the abilities sampled and even if the 
! measures of vocational success were perfect, the relationship existing 
pe between tests and criteria are so low that no vocational counselor 
could have foretold how much money a boy would earn at age 20.0 
to 22.0 or how happy he would be on his job at those ages. 
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SOME OBSERVATIONS AND DATA ON CERTAIN 

METHODS OF MEASURING THE PREDICTIVE 
SIGNIFICANCE OF THE PEARSON PRODUCT- 
MOMENT COEFFICIENT OF CORRELATION 


HARL R. DOUGLASS 


University of Minnesota 


It is becoming common procedure to employ as a measure of the 
predictive value of a Pearson product-moment coefficient of correla- 
tion, a value referred to as the “‘per cent of improvement over sheer 
guess.” The formula employed to obtain this value, frequently 
referred to as the predictive efficiency index, is: 


eee, oa 


The formula is arrived at as follows: If the mean of the value which 
it is attempted to predict is known and no other information useful 
in prediction is available, the most intelligent guess of the value for 
each case, the guess with the least standard error, is the mean. The 
standard error of such guesses is of course the standard deviation of 
the actual values of the cases which it is attempted to predict. The 
value of the standard error of the prediction, made by use of the 
regression equation employing the coefficient of correlation, is 


O ret y = OyV l ane! T2'y 


in which y is the variable predicted from known values of z. The 
ratio of the standard error when the regression equation is employed, 
to the standard error when the mean is “guessed”’ is therefore+/ 1 — r,'y. 
In other words, the standard error of estimate has been reduced by 
1 — 1 — 1,',, the value of E. 

Values of E for various values of r,, may be readily calculated. 
Tables of such values have been calculated and the functional rela- 
tionship plotted by several workers. Space will be employed here 
only for a few values of EF for known values of r., for the convenience 
of those who have little experience with the formula and who have 
difficulty visualizing curves from formulae. They are 





Tey .00 .20 .40 .60 .80 . 90 , 
E 00 .02 .08 .20 .40 .56 .70 .98 
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It is obvious that judged by this standard, values for r must be 
high to insure prediction reasonably superior to the device of guessing 
the mean for each case. 

It is the purpose of this note to call attention to certain limitations 
of this method of interpretation of the value of r and to suggest certain 
alternative methods. In the first place, it is becoming conventional 
to say that prediction of one variable from another with which the 
coefficient of correlation is known to be, let us say, .80, is only forty 
per cent better than ‘“‘sheer guess.’”’ This terminology is misleading, 
in that knowing and employing the mean can hardly be thought of as 
a procedure which is ‘“‘sheer’”’ guess. It would be better to refer to it 
as a shrewd guess based upon information which we rarely have in 
making predictions except in the form of approximations, namely, 
the mean of the values being predicted. 

As methods to which sheer guess could be applied more accurately, 
the following methods were employed, the resulting standard errors 
of estimate calculated and the ‘‘ Efficiency Indexes” compared to that 
given by the formula 


E=1-Vi-nry, 


H. C. Garrett in his Statistics in Psychology and Education! 
furnishes, on page 153, one hundred twenty pairs of values for two 
variables. Two sets of one hundred twenty square slips of paper each 
were prepared and on these slips were written the values of z and y 
as given by Garrett. Each set was put in a waste basket, shaken 
and stirred thoroughly, and one by one, pairs of numbers, one from 
each basket, drawn and the values plotted on a correlation scatter- 
gram, neither slip being returned to the basket. The values of y 
thus “‘predicted” were then compared to the actual values of y as given 
by Garrett and the ratio of the standard error of estimate of predictions 
made by use of the regression equation to the standard error of this 
type of sheer guess calculated. As was to be expected, it was con- 
siderably smaller than the ratio of the standard error of estimate by 
use of the regression equation to the standard error of the so-called 
‘sheer guess,’ when the mean of the distribution is guessed for each 
case, the values of the ratios being 1.24 + 1.55 or .80 and 1.24 + 2.31 
or .54, respectively. 





1 Values for this note were taken from sources readily available to obviate 
reproducing them here. 
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The procedure was repeated with the following deviation: The 
number drawn from the y basket was replaced and the numbers 
thoroughly stirred after each drawing. The respective ratios thus 
obtained were .80 and .57. 

Another approach was also used. Rarely is the mean of the 
values to be predicted accurately known though in many situations 
well informed individuals may furnish very good approximations to it. 
For the data under discussion, of which the standard deviation is 
7.75, let us assume that the guessed mean deviates from the actual 
mean by five. The ratio of the standard error of estimate when the 
regression equation is used to that of ‘‘sheer guess” predictions maed 
by assuming this guessed mean to be the value of each case, is .70. 

These calculations were repeated on similar tables of correlated 
values furnished by Karl J. Holzinger in his Statistical Methods for 
Students in Education, page 152, and by E..M. Draper and A. C. 
Roberts in Principles of Secondary Education, page 414. The 
resulting ratios and E’s are shown in the following table: 








Ratio Efficiency 
index 

Garrett data (r = .60). 

EE EOD 4 5. u's sun newekedacewanss hewedhas .80 .20 

First type of random guess.................cecceceecees 54 46 

Second type of random guess....................0eeeeee: .57 .43 

Guessed mean (error = .65c) as “sheer’’ guess............ .70 .30 
Holzinger data (r = .78). 

as ae eae BU Gk Ws abe seaman wad .63 .37 

First type of random guess. .............ccceccccccccecs .56 44 

Second type of random guess...................-.00e000: 54 46 

Guessed mean (error = .9c) as “‘sheer” guess............. .50 .50 
Draper and Roberts data (r = .50). 

Ne cay Soccdccaceccccetzbcccbeaue .87 13 

First type of random guess..................0.00e0eeeee .56 44 

Second type of random guess...................e0eeeeees 54 . 56 

Guessed mean (error = .77c) as “‘sheer” guess............ .70 30 











It is obvious that the use of the regression equation for predicting 
one variable from another, the correlation of which with the first is 
known, yields materially more accurate estimates than truly “sheer” 
guess even though the correlation between the two variables be no 
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greater than .50, inasmuch as the experimental situations discussed 
here involve random guesses. Even these are improvements over 
real sheer guess, being limited as they are to the actual values of the 
variable being predicted. | 

As Segel! has pointed out for a great many situations in educational 
administration and guidance, e.g., homogeneous grouping, in which 
predictions are made of probable success from known values of such 
variables as IQ, MA, previous marks, etc., it is not wise to employ asa 
measure of predictive efficiency the standard error of estimate of 
fallible marks, but that instead a more logical measure is the standard 
error of prediction of ‘‘true” or completely reliable scores. In pre- 
dictions of success for the purposes of educational or vocational 
guidance, homogeneous grouping and similar situations, one is not 
concerned with predicting the actual mark which the student will 
make, except as it is an approximation of his achievement or progress. 
Marks are known to be to some extent unreliable, containing chance 
errors as well as more or less systematic errors leading to lack of 
validity. One is not permitted to estimate mathematically from a 
correlation between two sets of fallible variables what the correlation 
would be between the first set of measures and valid measures of the 
second variable. One can, however, estimate with considerable 
chance of accuracy the correlation between the first set of measures 
and completely reliable or ‘‘true” measures of the second variable, 
and one may calculate very useful estimates of the correlation between 
completely reliable measures in both variables. The means for these 
estimates are the following formulae for correction for attenuation of 
coefficients of correlation between fallible measures: 








— r — r 
Ts = se and ray = 9 
Tuy, Toy, vy, 


in which r., is the estimated or corrected coefficient between perfectly 
reliable measures, rz, is the obtained coefficient between the fallible 
or unreliable measures and rz,-, and r,,,, are the coefficients of reliabil- 
ity of the tests or other means of measurement of z and y respectively. 

The standard error of estimate of ‘‘true’’ measures is always less 
than that for fallible measures. Hence when evaluated by the more 





1 Segel, David: A Note on an Error Made in Investigations of Homogeneous 
Grouping. The Journal of Educational Psychology, Vol. XXIV, January, 1933, 
pp. 64-66. 
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logical criterion, the predictive efficiency of one variable for another is 
greater than that indicated by the uncorrected coefficient of correla- 
tion. If, for example, an r of .63 is obtained for intelligence and 
school marks and the reliability of the marks is no greater than .81, 


31 .70, and 


the Efficiency Index is larger by one-fourth than that suggested by the 
uncorrected coefficient. 

If the situation is that of measuring the relationship that exists 
between two traits and one does not have perfectly reliable measures, 
the use of the obtained uncorrected coefficient is clearly incorrect. 
Far too many workers have obtained what seemed to be evidence of 
lack of relationship between two traits. In a number of these situa- 
tions because of the lack of reliability of the means of measuring the 
two traits, no very high coefficient of correlation could have been 
possibly obtained even if the two traits were identical or otherwise 
closely correlated. If, for example, the coefficients of reliability of 
two measuring instruments are .64 and .49 and the intercorrelation 
.86, the coefficient of correlation, which would have been obtained 
for the group had perfectly reliable measures been used, is certainly 
greater than .36 and may be estimated to be 





the coefficient corrected for unreliability of marks is 


36 ” 
/ 64.49’ 


One should not forget, however, that in evaluating bases for predic- 
tion, since one must make his predictions on the basis of fallible meas- 
ures, correction for attenuation in the variable from which predictions 
are made is not permissible. 

Another error made quite commonly consists in employing as a 
measure of relationship and of predictive efficiency, the coefficient 
of correlation for variables between which the correlation is not linear. 
The correlation ratio, which should always be employed instead of the 
coefficient of correlation with data yielding non-linear relationships, 
is systematically greater than the coefficient of correlation and is 
& much more accurate measure of relationship than is the latter. The 
reader may find in Garrett’s text, previously referred to, on page 207 
& Scattergram for which r is .80, while the correlation ratio is .93. 
The error of assuming, without investigation of the possibility of non- 
linear relationships, that the degree of relationship is always repre- 
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sented by the coefficient of correlation, while inexcusable in published 
investigations, is not rare. 

Much better than the “Efficiency Index” as a measure of predic- 
tive efficiency in homogeneous grouping or other situations in which 
the placement of individuals in groups according to probable achieve- 
ment, ability to achieve, or other variables, is the proportion of indi- 
viduals properly allocated and the proportions misplaced by one, two 
or more groups. For example, for grouping pupils into three groups, 
one contains the upper twenty per cent, one the lower twenty per cent, 
and the other the middle sixty per cent, the number properly placed 
on the average on the basis of a variable or combination of variables 
correlating .80 with the predicted variable is 71.6 per cent, for .60, 
60.4 per cent, while by chance only 44.0 per cent of them be properly 
placed. The percentages misplaced by two groups are .1 per cent, 
1 per cent, and 8 per cent, respectively. Similar calculations may be 
made from Thorndike’s tables of displacements for various values of r. 

This idea may be carried a step further. It happens frequently 
that a variable correlating less closely than another with that one 
which it is desired to predict, is really more useful in determining who 
shall fall above or below a certain value of the predicted variable. 
It should happen frequently that “ critical” points may be located which 
enable one to classify individuals into two groups on some such basis as 
satisfactory and unsatisfactory with more accuracy than suggested 
by the magnitude of the coefficient of correlation. 

One more type of error should be mentioned. It is not rare that a 
fairly high degree of relationship will exist between two variables in a 
restricted range or ranges of one or both variables and a very loose 


relationship in other ranges as may be illustrated in the following 
scattergram: 
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Obviously the coefficient of correlation for the scattergram is too small 
to represent the relationship between y and lower values of z, and 
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too great to represent the correlation between y and larger values of x. 
The point is made to demonstrate the necessity of examining scatter- 
grams to discover to what extent it is possible to represent the rela- 
tionship in all ranges of the two variables by a single coefficient. 

The coefficient of correlation has come to be, along with the mean, 
median, probable error, standard deviation and the quartiles, one of 
the most frequently used statistical constants. It easily takes front 
rank in the frequency with which it is misinterpreted, especially for 
purposes of evaluating variables on the basis of their predictive effi- 
ciency. It is not the purpose of this communication to make clear in 
so limited a discussion the techniques and theory involved, but rather 
to call to the attention of that appallingly large number of workers who 
in attempting to throw light on various educational problems, because 
of failure to realize the possibilities and limitations of technique which 
they are employing, are prone to waste much good paper and printers’ 
ink as well as time and energy of their readers. Worse still, they are 
actually distributing among workers in the field and fellow investi- 
gators, error disguised as scientific truth—a disservice which unfortu- 
nately will come home to roost not only to the perpetrator but to 
statistical method. 
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“THE FACTOR THEORY AND ITS TROUBLES”: 
MISREPRESENTATION OF A CRITICISM OF 
THE THEORY 


ROBERT C. TRYON 
University of California 


In view of the complexity of the genetic and experiential causes 
of individual differences in abilities, the theory of a universal ‘‘g” 
with its group and unique specific factors appears fictitiously simple. 
In 1932 I published ten sets of tetrad analyses which aimed to dis- 
cover whether a ‘‘g’”’ with unique specifics’ or a “‘g” with a few group 
factors plus unique specifics? actually were consistent with experi- 
mental evidence. Consistency was not found. 

Surprising it is then to read Professor Spearman’s recent paper in 
the November, 1933, JourNAL or EpucaTIONAL PsycHOLoGy, and 
there to find his claim that critics have misapprehended his theory 
by their inferring that he does not admit group factors along with 
his ‘‘g.”” In his article, under theorems (2) and (2a) and in Fig. 2, 
Spearman presents his envisagement of group factors. He states that, 
as for my own criticism of his theory, I conveyed ‘‘no indication that 
the theory with even greater confidence ‘expects’ the case given in 
(2), (2a) and Fig. 2.”” Indeed, so fully aware of this case was I that 
I devoted my entire second paper to the case of group factors; in 
fact, the very title of the paper so designates it. In this article, I 
listed forty-one group and general factors proposed by Spearman 
himself and by his students. As further evidence of my cognizance 
of his group factors one will note that I appended to each enumerated 
group factor a reference to an article by Spearman or his students in 
which the factor in question was proposed. Furthermore, under the 
subcaption ‘‘ Expected forms of T [the triplet-set of tetrad differences] 
on the assumption of ‘g,’”’ the case of Spearman’s Fig. 2 is shown in 
equational form in equations (3) and (5), and the more general case 
of numerous overlapping group factors is given in the first paper in 
equation (6). In place of ‘‘g’” I wrote another symbol, ‘‘a,”’ but this 
slight modification should not of course confuse Professor Spearman. 





1 Multiple factors vs. two factors as determiners of abilities. Psychol. Rev., 
Vol. XXXIX, 1932, pp. 324-351. 


2 So-called group factors as determiners of abilities. Psychol. Rev., Vol. XX XIX, 
1932, pp. 403-439. 
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This writer is willing to agree that the theory of “g’’ with its 
retinue of group and unique factors is tenable in the sense that any 
arm-chair hypothesis is tenable. The troublesomeness lies not in the 
misapprehension of critics, but in their realization of the fact, as shown 
for example in my own review, that such an hypothesis is not supported 
by thorough-going tetrad analyses. The tetrad-difference criterion 
is, in fact, an inferior device for testing the more complex expression 
of the theory. More adequate techniques which lead to a test for 
statistical consistency are necessary,' but Spearman and his students 
have not employed them. 





1 See, for example, the improved techniques described in the following refer- 
ences: Thurstone, L. L.: A simplified multiple factor method and an outline of the 
computations. Chicago: Author, 1933, Pp. 25. Kelley, T. L.: Crossroads in the 
mind of man. Stanford Univ. Press, 1928, Pp. 238. Hotelling, H.: Analysis of a 


complex of statistical variables into principal components. J. Educ. Psychol., . 


Vol. XXIV, 1933, pp. 417-441, 498-520. 
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BOOK REVIEWS 


Susan Isaacs. Social Development in Young Children.. New York: 
Harcourt, Brace & Company, 1933, pp. XII + 480. 


Unless the reader who has been attracted by the title of this volume 
is an ardent disciple of Freud, he is sure to be disappointed by the 
author’s narrow conception of the young child’s social development 
which is made to hinge entirely around problems of infantile sexuality. 
The book is the second in a series of three volumes planned by the 
author and follows her Intellectual Growth in Young Children which was 
published in 1930. 

The approach to this highly important aspect of child development 
is admittedly qualitative, and based on voluminous ‘‘records” and 
notes made on the behavior of the children in the Malting House 
School in London (of which the author is the director) and on corre- 
spondence with mothers and nurses regarding certain problems of child 
training. The interpretation of these ‘‘records’’ is decidedly Freudian 
and the author’s theoretical considerations are largely influenced by 
the writings of Klein and Searl of the psychoanalytic school. 

Dr. Isaacs is of the opinion that ‘‘experimental methods have. . . 
proved enormously fruitful in the study of intellectual growth, of 
learning and of language. But in the field of social development they 
are almost inapplicable.” After citing the distinction which Anderson 
makes in the Handbook of Child Psychology between systematic observa- 
tion and incidental observation, she attempts to claim a place for her 
“records” in the former category, whereas in reality they have been 
jotted down incidentally in diary fashion, admittedly without pre- 
liminary selection of the events to be observed. 

The bibliography of seventy-eight titles, representing the work of 
twenty-six authors, is decidedly psychoanalytic in character, and the 
many excellent observational and experimental approaches to the 
problem of social development, as well as those concerned with 
the evaluation of new techniques of observation and measurement in 
this field, are conspicuous by their absence. 

DorotHea McCarry. 
Fordham University. 


HERBERT BuuMER. Movies and Conduct. New York: The Macmillan 
Co., 1933, pp. XIV + 257. 


Movies and Conduct is a Payne Fund Monograph prepared as part 
of the Motion Pictures and Youth Series. It is a study of personal 
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accounts of youthful experiences with motion pictures. Guided auto- 
biographical memory and introspective reports of interests, emotional 
experiences, and behavior accounts prepared by college students were 
analyzed to determine the effect of movies on the subjects in the 
spheres mentioned. Evidence of influences resulting in impersonation, 
imitation, fantasy, emotional possession, emotional detachment, and 
acceptance of schemes of life were found. Reports by college students 
were compared with specific answers to questions supplied by twelve 
hundred sixth and seventh grade school children. Influence of motion 
pictures on play is described as “great”; as a source of imitation, 
‘considerable,’ especially in imitation of mannerisms, poses, forms of 
beautification, ways of courtship, and ways of lovemaking. The 
study is regarded by the author as being exploratory in character. 


Gurn U. CLEETON. 
Carnegie Institute of Technology. 


S. L. Presszy. Psychology and the New Education. New York: 
Harper and Brothers, 1933, pp. XX XI + 594. 


The present review is being written by a man who not only has 
read the book, but has also used it in class. Its reception by the 
students was enthusiastic, one of them going so far as to say, ‘If I 
don’t get another thing all year, I’ll be satisfied.” 

Anyone who is familiar with the work of S. L. Pressey will agree 
that it is above all practical. And in attempting to analyze the rea- 
sons for the book’s popularity, the reviewer has come to the conclusion 
that this is primarily due to the fact that the material is concrete 
and useful. 

This success may in large part be explained by the fact that the 
author knew what to leave out. There is no repetition of the material 
of general psychology, and the student consequently feels that he is 
breaking new ground. The author seems also to feel that the average 
teacher does not need to know what effect various drugs have on 
human efficiency. There has been an avoidance of controversy; 
consequently we find no speculation about instincts. Since the text 
is for undergraduates, there Las been no attempt to present an exhaus- 
tive survey of research. The general conclusions are stated simply and 
concretely and are accompanied by illustrative data. The student is 
therefore not lost in a maze of somewhat conflicting statistics. The 
impression must not be allowed, however, that the book is a superficial 
example of arm chair platitudes. It is evident to the technical reader 
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that the author has made an exhaustive study of the research in the 
field, and has done an admirable job in leading his readers beyond 
percentiles and critical ratios and in causing them to see children. 

Other distinctive features of the book should be mentioned. 
Attention is given to the problems of physical development and health. 
In addition to the material on sensory defects, there is a consideration 
of malnutrition and measles. There is a study of the peculiarities 
of child society, and the effect of different types of home environment. 
We find data in regard to the play interests of children of different 
ages, and in regard to their movie preferences. The case study method 
is stressed in the hope that the individual child will appear. ‘The 
writer’s special study of abnormal psychology appears to good advan- 
tage in his chapter on emotional stress. Environmental influences 
are stressed to a greater extent than is often the case, this viewpoint 
being of special interest since it comes from a man who has long been a 
leader in the development of intelligence testing. The learning curves 
are concerned, not with ball tossing, but with the acquisition of school 
subjects. Attention is given to the permanence of learning over long 
periods. 

Four kinds of supplementary material are provided: a Teacher's 
Manual, a series of objective tests, Class Exercises and Experiments, 
and the more informal Ezercises in A pplication. ME LvIn Riga. 

Kenyon College. 


ALGERNON CoLEMAN, Compiler. An Analytical Bibliography of 
Modern Language Teaching, 1927-1932. Chicago: University 
of Chicago Press, 1933., pp. 296. 


The desirability of further research in the methodology and 
materials of modern foreign language teaching, in the especial fields 
of correlation and transfer of training and of language study as 4 
source of social insight, has become more readily apparent with 
recent publication of “‘An Analytical Bibliography of Modern Lan- 
guage Teaching, 1927-1932,” by Algernon Coleman. 

In this volume, encyclopedic in its scope, Mr. Coleman and his 
colleagues have summarized five hundred seventy books, articles and 
manuscripts, and have classified them under the headings: “ Psy- 
chology of Learning,” “General Trends in Language Teaching,” 
‘“‘Aims, Materials and Methods,” ‘Tests and Testing,’’ etc. 
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Assuredly the survey is inclusive. The location of the resumes 
of the Publications of the American and Canadian Committees on 
Modern Languages, at the beginning of the volume, is strategic, 
although placing of discussions of the Coleman Report in this section 
rather than in the section on “ Aims, Materials and Methods” would 
seem to give them disproportionate emphasis. A first hand acquaint- 
ance with approximately half of the books and articles treated, 
leaves no question in the mind of the reviewer as to the ample justice 
paid them, almost without exception. 

The purpose of the work is “to make accessible to modern language 
teachers summaries of the articles, books and reports of investigations 
during the five-year period, which throw light on current opinion and 
practice, or yield new data by which the teaching of the future will be 
affected.”” For the language teacher, however, an extensive pruning 
of the material reviewed would perhaps have rendered the work more 
helpful, inasmuch as a high percentage of the articles involves re-state- 
ment of ideas better expressed elsewhere, duplication of experimental 
procedures lacking in significant or new data, statistical findings of 
questionable value and slight interest. But for the graduate student 
and research scholar, this mass of predigested material should be a 
very real time saver, both in the matter of eliminating the necessity 
of reading numerous books and articles with misleading titles or 
of slight value, and through preventing further re-statement and 
duplication. 

A perusal of the five hundred seventy summaries leaves one not 
without sympathy for the opinion of Huse, who indicates that there 
is among language teachers no agreement as to what to teach, how to 
teach it, or for what purpose it should be taught. 

In contrast to this lack of agreement on basic principles, one is 
struck by the great similarity in the research procedures of the past 
few years. There is, for example, a preponderance of tabulated 
statistical material. There are frequency counts of words, idioms, 
personal pronouns, verb tenses, derivatives, grammatical errors, 
cultural allusions, tense nomenclatures, and further statistics resulting 
from applications of the findings involved in these counts to textbooks, 
to objective tests, to teaching devices. There is also an abundance 
of tabulation relative to the teaching population and to the student 
population, involving the percentages of failures, the time devoted to 
language study, the different languages taught in public and private 
schools in various sections and cities and in the country as a whole. 
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While some of this research should lead to significant improvement, 
particularly in construction of textbooks, other portions seem more 
or less in the nature of “‘ busy work.” 

That unreliable arcana of introspection, the questionnaire, con- 
tinues much in evidence as a variant form of tabulation, although 
its use in determining objectives, teaching procedures and student 
accomplishment, has declined. It is employed more particularly in 
such aspects as why students do or do not continue their study of a 
foreign language, what students hope to attain or feel they did not 
attain to a sufficient degree in foreign language study, which textbooks 
are more desirable or popular. 

Impressive, and in many instances significant, are statistical 
findings resulting from the extensive construction and use of objective 
tests. This field of tests and testing, regarded with so much distrust 
by linguists a dozen years ago, shows a mushroom development for 
measuring language aptitude and every known language skill. 

Two fields in which it seems further study and research would be 
highly profitable, however, are those of ‘‘Correlation and Transfer 
of Training’ and those aspects of ‘‘Aims, Materials and Methods” 
which treat foreign language as a source of insight, of broadened 
outlook, of national and international understanding. It would seem, 
in fact, that these fields might be combined: that transfer of training 
should be considered primarily in terms of enrichment of concepts 
involving American and foreign cultures, rather than in Thordikean 
identical elements sought in terms of grammatical skills, English 
usage, etc.; that a most significant form of transfer is that which 
through discovering familiar meanings in new contexts, contributes 
to the student’s ability to evaluate more intelligently the social forces 
which have shaped and are shaping the destinies of his own nation 
and of other nations. This stress on insight is, of course, implied in 
the reading aim, and is frequently paramount in the treatment of 
realia. It is also emphasized by such advocates of Kulturkunde as 
Aronstein and Otto. There seems, however, insufficient application 
of this point of view, either in determining the reading experiences 
of the pupils or in developing breadth of outlook in the teachers. 
The preponderance of emphasis continues to fall on the mechanics of 
learning, to the neglect of foreign language as a source of Erkenninis 
and Weltanschauung. Marron J. Hay. 

Florida State College for Women. 
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Artour T. Jersitp. Child Psychology. New York: Prentice-Hall 
Inc., 1933, pp. XIII + 462. 


In contrast to many of the recent texts which have appeared in the 
field of child psychology, this work is not a general adult psychology 
revamped, superimposed upon, and restricted to, lower age levels. 
It is, rather, an interesting compendium presenting in a readable style 
much of the factual material on young children, which has been 
emerging so rapidly in the past few years from research laboratories. 
The author, who has contributed prolifically to research in this field, 
has adhered closely to the results of experimental and observational 
studies that obviously have been his chief source. The frequent 
introduction of apt illustrative material from child life makes it 
apparent that the writer has direct knowledge of young children as a 
first hand observer, and not as an arm-chair student of the observations 
of others. The material, however, is kept on an objective basis, and 
does not revert to the level of sentimentality so frequently found in 
books on young children. 

Although the emphasis is placed on the child of preschool years, the 
school age child is given due consideration. There is minor emphasis 
placed on physical development and it is restricted for the most part 
to the section on the motor development of the infant. The psycho- 
logical aspects of child development, particularly emotional and social 
development, learning, and the growth of understanding are adequately 
handled. An excellent chapter is devoted to the measurement and 
prediction of individual differences in mental ability, and the conclud- 
ing chapter gives a brief and sane discussion of some of the applications 
of the results of experimental investigation to the practical problems 
of child care and training. 

An excellent bibliography of accurate citations, limited chiefly to 
studies actually discussed in the text, follows each chapter. Most 
of the references are of recent date, and many are to unpublished 
research and to studies in press. This book is worthy of serious 
consideration as a text by anyone teaching a course in child psychology 
at the college level, and it would be an excellent addition to the libraries 
and reading lists of advanced groups in child study and parent 
education. DorotTHea McCarrtay. 
Fordham University. 
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Omar C. Hetp. An Attempt to Predict the Success of University Fresh- 
men in Their Adjustment to Scholastic Work. Ann Arbor: Edwards 
Brothers, 1933, pp. IV + 50. 


This study confirms the findings of earlier investigators by showing 
that: (a) Accuracy of prediction of success in college is increased by 
using several criteria; (b) multiple correlation coefficients between 
.60 and .70 are about the highest that may be expected between college 
marks and such factors as scores on intelligence, subject-matter place- 
ment, and reading tests, Personal Inventory, and high school stand- 
ing. Reports the relative prognostic value of several forms of pre- 
admissions information such as age, color, father’s occupation, type of 
high school last attended, nativity, and health. Bibliography. 

GuLen U. CLEETON. 
Carnegie Institute of Technology. 


Roy O. Bruuetr. Provisions for Individual Differences, Marking, and 
Promotion. Washington: Government Printing Office, 1933, pp. 
XI + 471. 


Replies by eighty-five hundred ninety-four school principals to a 
one-page inquiry emanating from the Office of Education, Department 
of Interior, are analyzed in this report. Homogeneous grouping is in 
use in twenty-seven hundred forty of the reporting schools but unusual 
success is claimed by only seven hundred twenty-one. Provision for 
individual differences through special classes, the Morrison plan, the 
Dalton plan, the Winnetka technique, unit assignments, variations in 
promotion, marking systems, and pupil schedule practices are discussed 
from the viewpoints of both theory and stated practices. The report 
bristles with statistical evidence which indicates that teaching proce- 
dures in use in the majority of high schools do not make adequate 
allowance for individual differences in the abilities of pupils. 

Guren U. CLEETON. 






Carnegie Institute of Technology. 
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Omar C. Hetp. An Attempt to Predict the Success of University Fresh- 
men in Their Adjustment to Scholastic Work. Ann Arbor: Edwards 
Brothers, 1933, pp. IV + 50. 


This study confirms the findings of earlier investigators by showing 
that: (a) Accuracy of prediction of success in college is increased by 
using several criteria; (b) multiple correlation coefficients between 
.60 and .70 are about the highest that may be expected between college 
marks and such factors as scores on intelligence, subject-matter place- 
ment, and reading tests, Personal Inventory, and high school stand- 
ing. Reports the relative prognostic value of several forms of pre- 
admissions information such as age, color, father’s occupation, type of 
high school last attended, nativity, and health. Bibliography. 

Guren U. CLEETON. 
Carnegie Institute of Technology. 


Roy O. Brett. Provisions for Individual Differences, Marking, and 
Promotion. Washington: Government Printing Office, 1933, pp. 
XI + 471. © 





Replies by eighty-five hundred ninety-four school principals to a 
one-page inquiry emanating from the Office of Education, Department 
of Interior, are analyzed in this report. Homogeneous grouping is in 
use in twenty-seven hundred forty of the reporting schools but unusual 
success is claimed by only seven hundred twenty-one. Provision for 
individual differences through special classes, the Morrison plan, the 
Dalton plan, the Winnetka technique, unit assignments, variations in 
promotion, marking systems, and pupil schedule practices are discussed 
from the viewpoints of both theory and stated practices. The report 
bristles with statistical evidence which indicates that teaching proce- 
dures in use in the majority of high schools do not make adequate 
allowance for individual differences in the abilities of pupils. 

Gurn U. CLEETON. 
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